SlideShare una empresa de Scribd logo
1 de 28
Analytical processing of Linked Data using OLAP




    Hiroyuki Inoue, SWIM Seminar, 16th May 2012
                                                  1
Agenda
Background
Objective
Related work
Proposal
Experiments
Conclusion
Future work

                2
Semantic Web and RDF
Semantic Web
  A movement that makes computers be able to
   understand meanings of resources on the Web
   as same as human with appending metadata.
    Resources: Documents, Images, …

RDF(Resource Description Framework)
  A basic model for describing information
  Triple: Subject,Predicate,Object
    Describe attribute or property of „subject‟ with „predicate‟, and
     write its value with „object‟
    Construct graph structure
                                              has_title      “Univ. of
        http://www.tsukuba.ac.jp/
                                                             Tsukuba”
                                                                         3
            Oval: Resource, Rectangle: Value (string)
Linked Data
 Objective: It enables people and organizations to
            share the structured data on the Web.
   Linked Data uses RDF for describing attributes
    and properties of data
   Linked Data encourages people to use data more effectively

 Features
   Uses URI and HTTP. It can be referred on the Web.
   Uses standardized technologies (RDF, URI, SPARQL)

 Linked Open Data
   Various data sets contains numerical and statistical values
     Human, Company, Biological, Medical, Music, Weather, …
     295 data sets, 31 billion triples, 5.4 hundreds links1
                                                                                 4

      1: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
It's necessary to apply analytical operation to numerical and
      statistical data that are published as a Linked Data.




                                                           5
OLAP(On-Line Analytical Processing)
 An analytical method for huge accumulated data
   It can answer to complex and statistical queries
                                        Category              OS        Mac
                                                                      Win
                                                         PC        Laptop
 Multi-dimensional model                                      Desktop

                                                          Aomori
   Data cube                                                      32 686
                                                                          128
                                                  East    Sendai
                                                                    8                     2
     Numerical values (facts)                                        100 64
                                                          Tokyo                         386
     Axis for analysis (dimensions)     Place                    686 386
                                                                           8             32
       hierarchical structure                            Osaka
                                                                  128 2
                                                 West       Hiro-          4             16
                                                           shima 386
                                                                       16 64
                                                           Fuku-                          8
                                                            oka     64
                                                                              32 686
                                                                    Q1
                                                                            Q2          128
                                                                                   Q3
                                                                     f-half              Q4
                                                                                    l-half

                                                                            Time                      6
                                       図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html
OLAP with Relational Database
Use data that is stored in Relational Database
Star Schema                                          Category            OS        Mac
                                                                                  Win
      Numerical values (fact table)                                 PC        Laptop
                                                                           Desktop
      Axis for analysis (dimension table)
                                                                      Aomori
                                                                               32 686
                                Dim.                                                  128
                 Category                                        東    Sendai
                                                                                8                     2
                 ID                                                               100 64
                                                                      Tokyo                         386
                 large category
                                                       Place                  686 386
                                                                                       8             32
                 small category                                       Osaka
                                                                              128 2
                                                               西        Hiro-          4             16
                                Fact                                   shima 386
                 Result                                                            16 64
          Dim.                                 Dim.                    Fuku-                          8
TIme             ID                    Place                            oka     64
ID               category_id           ID                                                 32 686
                                                                                Q1
                                                                                        Q2          128
half             time_id               NS                                                      Q3
                                                                                 f-half              Q4
quarter          店舗_id                 city name                                                l-half
                 sales volume
                                                                                        Time
                                                   図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html 7
Objective and problems
Objective
  Propose a method applying OLAP to typical numerical or
   statistical data that‟s published as Linked Data

Problems
  It‟s difficult to analyze RDF on OLAP directly
    Need to convert graph data for analysis
     on existing OLAP system
  Prepare axes and hierarchies for analysis
    OLAP needs axis and hierarchies for analysis




                                                      8
Related work
Benedikt et al1
     Proposed a conversion method of Linked Data for OLAP
     Uses RDF Data Cube (QB) vocabulary
         A RDF vocabulary for describing a Data Cube in RDF
         Convert Linked Data that follows QB Voc. to RDB
         And hierarchical structures are written in RDF


     However, there were only a few Linked Data sets that
      follows QB Voc.




1. B. Kampgen, and A. Harth. Transforming Statistical Linked Data for Use in OLAP Systems,
I-SEMANTICS 2011, 7th Int. Conf. on Semantic Systems, 2011.                                  9
In this research, we propose a method
to apply OLAP to TYPICAL Linked Data




                                         10
Approaches



   Mapping RDF (graph structure) to relational schema




         Create hierarchical structure from data and
                data links, semi-automatically
            (using features of RDF, Linked Data)



                                                        11
From retrieval of data to schema creation

1. Retrieval of RDF Data

  2. Store RDF Data to RDB

     3. Selection of analysis target

        4. Create dimension table

           5. Create schema for OLAP
                                             12
1. RDFデータの取得




1. Retrieval of RDF Data                          2. RDFを関係データベースへ格納


                                                    3. ユーザーによる分析対象の選択


                                                      4. 次元表の作成

Retrieve RDF Data for analysis                         5. スキーマの導出


   Use RDF dump
   Refer with URI
     Retrieve resources in the same host recursively
     Retrieve outside resources are used as Object too




Resources for analysis(www.example.com) Outside resources(www.w3.or
                                                             13
1. RDFデータの取得




2. Store RDF Data to RDB (1/2)
                                                                                 2. RDFを関係データベースへ格納


                                                                                    3. ユーザーによる分析対象の選


                                                                                     4. 次元表の作成

Store triples to RDB in each “rdf:type”                                               5. スキーマの導出


    We can‟t decide table schema
        RDF must not use schemas
        store as vertical representation                              1. DJ Abadiら,2007.

Tsukuba                   computer                     “190,000”
                   category                 volum
   store
      result_1                     time_1                 “Result” table(vertical)
                         time               subject        predicate    object          type
                                            result_1       time         time_1          resourc
           result_2                                                                     e
                         time               result_1       category     computer        resourc
                                  time_2                                                e
                                            result_1       volume       “190,000”       literal
                         volume
rdf:type                                    result_1       store        Tsukuba         resourc
                            “4,000”                                                     e
 Result          Store
                         Mito               result_2       time         time_2          resourc
                                                                                            14
                                                                                        e
1. RDFデータの取得




2. Store RDF Data to RDB (2/2)
                                                                                       2. RDFを関係データベース


                                                                                        3. ユーザーによる分析対


                                                                                          4. 次元表の作成

Convert vertical table to horizontal one                                                   5. スキーマの導出



    we can know what attributes or properties there are
“Sales result” table(vertical)
subject       predicate    object       type                            Sales result
result_1      time         time_1       resourc                         subject[PK]
                                        e                               time[FK]
result_1      category     computer     resourc                         category[FK]
                                        e
                                                                        volume
result_1      volume       “190,000”    literal
                                                                        store[FK]
result_1      store        Tsukuba      resourc
                                                                          got schema
                                        e
result_2      time         time_2       resourc                             PK: primary key
                                        e                                   FK: foreign key
result_2
“Sales         volume      “4,000”
           volume” table(horizontal)    literal
result_2
subject[PK]   store
                time[FK]   Mito
                           category[FK] resourc
                                              volume        store[FK]
                                        e
result_1        time_1     computer           “190,000”     Tsukuba
result_2        time_2     null                   “4,000”   Mito
                                                                                            15
1. RDFデータの取得




 3. Selection of analysis target
                                                   2. RDFを関係データベースへ格納


                                                        3. ユーザーによる分析対象の選


                                                         4. 次元表の作成



User chooses which value will be
                                                           5. スキーマの導出




 targeted in the analytical operation
                                          Stock
                            Category
          Sales volume                    subject
                            subject
          subject                         product_name
                            name
Time      category_ID                     category_ID
subject   time_ID                         store_ID
date      store_ID          Store         stock numbers
time      volume            subject
                            location_ID   gn:location
                                          subject
          Visitor
                                          name
          subject
          store_ID
          time_ID
          visitor counter
                                                                16
1. RDFデータの取得




    4. Create dimension table
                                                                              2. RDFを関係データベースへ格納


                                                                               3. ユーザーによる分析対象の選


                                                                                   4. 次元表の作成



    Create dimension table for OLAP
                                                                                    5. スキーマの導出




1      We propose 3 creation methods                                                           2
Use literal that is written in data directly             Use layer structure that
                                                        is written in data directly

                            Fact
                                          category
                Sales volume
                                          subject
                subject
 Time                                     name               3
                category_ID
 subject
                time_ID
                                                                     Use layer structure
 date                                                                from outside data
                store_ID
 time
                volume                    store                      gn:location
                                          subject                    subject
                                          location_ID                name

                                                                 Blue: outside resources
                                                                                           17
3
    Use layer structure from outside data
    Use other datasets that can use layer structure

                                           outside datasets (resources)
    target dataset

                     Tsukuba                        Japan
                        Mito         Ibaraki



                                 layer structure
    e.g.)GeoNames
        Can use geologically layered structure
        Tsukuba / Ibaraki / Japan / Asia

                                                                    18
1. RDFデータの取得




5. Create schema for OLAP
                                                                   2. RDFを関係データベースへ


                                                                       3. ユーザーによる分析対象


                                                                        4. 概念階層の作成

e.g.) A case “sales volume” was selected as                              5. スキーマの導出


 a measure
   fact-table)Results
   dim.-table)Time,Category,Store-gn:place

                 Dim.                         Category   Dim.
       Time                            Fact   subject
                        Results
       subject                                name
                        subject
       hour
                        category_subject      Store-gn:place
       day
                        time_subject          subject
       month
                        store_subject         L1(district)
       quater
                        vlume                 L2(prefecture)
       year
                                              L3(country)
                                              L4(continent)
                                                                Dim.

                                                                            19
Experiments
Objective
   Convert data and create a schema for OLAP from
    numerical/statistical data that is published as Linked
    Data

Exp. 1) Radiation observation data
   National Radioactivity Stat as Linked Data1
     A dataset from 環境放射能水準調査 by 文部科学省
     文部科学省発行の環境放射能水準調査2をRDF化したデータセット

Exp. 2) Weather observation data
   Linked Sensor/Observation Data3
     Observatory meta data from more than 200,000 places.
     Hurricane observation data from these observatories
                   1 http://www.kanzaki.com/works/2011/stat/ra/
                   2 http://radioactivity.mext.go.jp/ja/monitoring_by_prefecture/
                                                                                    20
                   3 http://wiki.knoesis.org/index.php/LinkedSensorData
Exp. 1)Results(1/2)
   Got radiation observed data by crawling
          Num. of Triples: 1,003,410 (March-Dec. in 2011)
          Geo. info.: Use dumped data from GeoNames

   Candidates of measure
          Observation instance, Time, Location
                          Obs. Instance                                                    Value
                                                         rdf:value
       ra:20110315/p02/t20                                                       “0.040”


                       ev:place
 ev:time                                                  Location
                                   gn:2111833

                                    Time
                                                                         tl:at
                time:20110315T22PT1H                                                    “2011-04-14T00:00:00”^^xsd:dateTime

“ra” は “http://www.kanzaki.com/works/2011/stat/ra/”,”time” は “http://www.kanzaki.com/works/2011/stat/dim/d/”
“gn” は “http://sws.geonames.org/”,”ev” は “http://purl.org/NET/c4dm/event.owl#”                                                21
”tl” は “http://purl.org/NET/c4dm/timeline.owl#”,”rdf” は “http://www.w3.org/1999/02/22-rdf-syntax-ns#”
Exp. 1)Results(2/2)
 Created schema when “value” was used as a measure
   fact-table)integrated table that contains obs. value
   dim.-table)
        1.   Time(created hierarchy structure from obs. time)
        2.   Location(got hierarchy structure from GeoNames)
Dim.
                                                Fact                       Dim.
Time                     observation_instance          Location
subject                  subject                       subject
sec                      place_subject                 layer_1(district)
min                      time_subject                  layer_2(prefecture)
hour                     value                         layer_3(country)
day                                                    layer_4(continent)
month

 (observed) value was used as string type
year

   It was needed to have the data type completed by the user
                                                                             22
Exp. 2)Results(1/2)
Used a dumped data
  Hurricane Bill (17-22nd/Sept./2009)
    Num. of triples: 231,021,108 / count of obs.: 21,272,790
    Geo info.: used a dumped data of GeoNames

Candidates of measures
  time, value, coordinate (lat., long.)

Created a schema when “Value” was used as a
 measure
  fact table: integrated table(obs. instance and value)
  dimension table:
   1. observed time (hierarchy structure got from time)
   2. observatory (hierarchy structure got from GeoNames)
                                                                23
“-121.6736"^^xsd:float             “45.0397”^^xsd:float
                                                                                           gn:Feature
                      wgs48:lat
wgs48:long
                                                   “3780"^^xsd:float
                                                                           om-owl:hasLocation
                             wgs48:alt
     wgs84:Point
                                                                        om-owl:LocatedNearRel
    om-owl:processLocation
                                                    om-owl:hasLocatedNearRel
             om-owl:System
                                                    Observatory
                   om-owl:procedure
                                                                                     Linked Sensor Data
                                                                           Linked Observation Data

      om-owl:generatedObservation                     om-owl:result                  Observed value
                                                                            om-owl:MeasureData
             instantOfObservation                                                 om-owl:floatValue
                         Observation instance                                                    “81.3”
                                                                                               ^^xsd:float

    om-owl:samplingTime
                                       time:inXSDDateTime                       “2004-08-10T16:10:00-06:00”
       time:Instant                                                                   ^^xsd:dateTime
     Observation time             “wgs84”    は “http://www.w3.org/2003/01/geo/wgs84_pos#” の略
                                  “time”     は “http://www.w3.org/2006/time#” の略
                                  “gn”       は “http://www.geonames.org/ontology#” の略
                                  “om-owl”   は “http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#” の略   24
                                  “xsd”      は “http://www.w3.org/2001/XMLSchema#” の略
Exp. 2)Results(2/2)
 Created schema                                                                          Dim.
                                                                       System_LocatedNearRel
                 Dim.                                                  (観測所-周辺-その他)
                                                          Fact
Instant (Time)          Observation_MeasureData                        subject
subject                 subject                                        ID(Name)
sec                     procedure (system_subject)                     Source URI
min                     samplingTime (Instant_subject)                 wgs84:alt
hour                                                                   wgs84:lat
                        floatValue
day                                                                    wgs84:long
month                                                                  layer_1(town)
year                                                                   layer_2(district)
                                                                       layer_3(prefecture)
                                                                       layer_4(country)
 Problem                                                              layer_5(continent)

        disappeared ontology (couldn‟t access)
           “weather:” ontology*
                                            * http://knoesis.wright.edu/ssw/ont/weather.owl#   25
Conclusion
Proposed a method to apply OLAP to typical
 Linked Data (numerical and statistical data)
   processing with features of RDF and Linked Data
     get layered structure from inside and outside of the data
     and created hierarchy (dimension-table) for OLAP

Applied method to two observation data
   convert data and prepare some axes for analysis when
    target is chose
   create star schema for OLAP



                                                                  26
Future work
 Revise method
   The triple has many objects are described in the same predicate
    case
   Subject has many “rdf:type”s case
   Subject has no “rdf:type” case

 In order to analyze more datasets
   Handling a lot of ontologies
   Provide a mechanism to use outside resources
      Definition of layered structure are different each

 Apply and verify a lot of kinds of data in other regions


                                                                      27
Thank you for your attention!




                                28

Más contenido relacionado

Destacado

Innovative Practice with Web 2.0
Innovative Practice with Web 2.0Innovative Practice with Web 2.0
Innovative Practice with Web 2.0JISC BCE
 
Maso200708 Miss Pattern
Maso200708 Miss PatternMaso200708 Miss Pattern
Maso200708 Miss Patternkyutae.kang
 
Saturn - Tcacenco Alina
Saturn - Tcacenco AlinaSaturn - Tcacenco Alina
Saturn - Tcacenco Alinaalexcurbet
 
Esler bedrijfspresentatie 2013
Esler bedrijfspresentatie 2013Esler bedrijfspresentatie 2013
Esler bedrijfspresentatie 2013Tsource
 
Premium Hd Wallpapers for Dreaming
Premium Hd Wallpapers for DreamingPremium Hd Wallpapers for Dreaming
Premium Hd Wallpapers for Dreamingfondas vakalis
 
Adidas for RACK magazine
Adidas for RACK magazineAdidas for RACK magazine
Adidas for RACK magazinetodd_tyler
 
Credit Its A Brand New Day
Credit Its A Brand New DayCredit Its A Brand New Day
Credit Its A Brand New Daypeglover
 
Вручение подарков
Вручение подарковВручение подарков
Вручение подарковserova
 
Location Based Marketing from Media2Go
Location Based Marketing from Media2GoLocation Based Marketing from Media2Go
Location Based Marketing from Media2Gohometown
 
The Art of the Interview
The Art of the InterviewThe Art of the Interview
The Art of the InterviewDan Kennedy
 
Best Waters H Q Wallpapers
Best  Waters  H Q  WallpapersBest  Waters  H Q  Wallpapers
Best Waters H Q Wallpapersfondas vakalis
 
Enhancing Business Performance
Enhancing Business PerformanceEnhancing Business Performance
Enhancing Business PerformanceJISC BCE
 

Destacado (15)

Innovative Practice with Web 2.0
Innovative Practice with Web 2.0Innovative Practice with Web 2.0
Innovative Practice with Web 2.0
 
Maso200708 Miss Pattern
Maso200708 Miss PatternMaso200708 Miss Pattern
Maso200708 Miss Pattern
 
Saturn - Tcacenco Alina
Saturn - Tcacenco AlinaSaturn - Tcacenco Alina
Saturn - Tcacenco Alina
 
Esler bedrijfspresentatie 2013
Esler bedrijfspresentatie 2013Esler bedrijfspresentatie 2013
Esler bedrijfspresentatie 2013
 
Help
HelpHelp
Help
 
Premium Hd Wallpapers for Dreaming
Premium Hd Wallpapers for DreamingPremium Hd Wallpapers for Dreaming
Premium Hd Wallpapers for Dreaming
 
Adidas for RACK magazine
Adidas for RACK magazineAdidas for RACK magazine
Adidas for RACK magazine
 
Credit Its A Brand New Day
Credit Its A Brand New DayCredit Its A Brand New Day
Credit Its A Brand New Day
 
Gothic Girl
Gothic GirlGothic Girl
Gothic Girl
 
Вручение подарков
Вручение подарковВручение подарков
Вручение подарков
 
Location Based Marketing from Media2Go
Location Based Marketing from Media2GoLocation Based Marketing from Media2Go
Location Based Marketing from Media2Go
 
The Art of the Interview
The Art of the InterviewThe Art of the Interview
The Art of the Interview
 
Market Pricing
Market PricingMarket Pricing
Market Pricing
 
Best Waters H Q Wallpapers
Best  Waters  H Q  WallpapersBest  Waters  H Q  Wallpapers
Best Waters H Q Wallpapers
 
Enhancing Business Performance
Enhancing Business PerformanceEnhancing Business Performance
Enhancing Business Performance
 

Más de Hiroyuki Inoue

クックパッドの開発プロセス
クックパッドの開発プロセスクックパッドの開発プロセス
クックパッドの開発プロセスHiroyuki Inoue
 
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャリアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャHiroyuki Inoue
 
LODのOLAP分析を可能にするETLフレームワークの提案
LODのOLAP分析を可能にするETLフレームワークの提案LODのOLAP分析を可能にするETLフレームワークの提案
LODのOLAP分析を可能にするETLフレームワークの提案Hiroyuki Inoue
 
OLAPを利用したLinked Dataの分析処理
OLAPを利用したLinked Dataの分析処理OLAPを利用したLinked Dataの分析処理
OLAPを利用したLinked Dataの分析処理Hiroyuki Inoue
 
RDBを中核としたXMLDBの開発
RDBを中核としたXMLDBの開発RDBを中核としたXMLDBの開発
RDBを中核としたXMLDBの開発Hiroyuki Inoue
 

Más de Hiroyuki Inoue (7)

クックパッドの開発プロセス
クックパッドの開発プロセスクックパッドの開発プロセス
クックパッドの開発プロセス
 
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャリアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
リアルタイム分析サービス『たべみる』を支える高可用性アーキテクチャ
 
PARIS at SWIM seminar
PARIS at SWIM seminarPARIS at SWIM seminar
PARIS at SWIM seminar
 
LODのOLAP分析を可能にするETLフレームワークの提案
LODのOLAP分析を可能にするETLフレームワークの提案LODのOLAP分析を可能にするETLフレームワークの提案
LODのOLAP分析を可能にするETLフレームワークの提案
 
Swim_2013_02_19_jpn
Swim_2013_02_19_jpnSwim_2013_02_19_jpn
Swim_2013_02_19_jpn
 
OLAPを利用したLinked Dataの分析処理
OLAPを利用したLinked Dataの分析処理OLAPを利用したLinked Dataの分析処理
OLAPを利用したLinked Dataの分析処理
 
RDBを中核としたXMLDBの開発
RDBを中核としたXMLDBの開発RDBを中核としたXMLDBの開発
RDBを中核としたXMLDBの開発
 

Último

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Último (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Analytical processing for Linked Data using OLAP

  • 1. Analytical processing of Linked Data using OLAP Hiroyuki Inoue, SWIM Seminar, 16th May 2012 1
  • 3. Semantic Web and RDF Semantic Web  A movement that makes computers be able to understand meanings of resources on the Web as same as human with appending metadata.  Resources: Documents, Images, … RDF(Resource Description Framework)  A basic model for describing information  Triple: Subject,Predicate,Object  Describe attribute or property of „subject‟ with „predicate‟, and write its value with „object‟  Construct graph structure has_title “Univ. of http://www.tsukuba.ac.jp/ Tsukuba” 3 Oval: Resource, Rectangle: Value (string)
  • 4. Linked Data  Objective: It enables people and organizations to share the structured data on the Web.  Linked Data uses RDF for describing attributes and properties of data  Linked Data encourages people to use data more effectively  Features  Uses URI and HTTP. It can be referred on the Web.  Uses standardized technologies (RDF, URI, SPARQL)  Linked Open Data  Various data sets contains numerical and statistical values  Human, Company, Biological, Medical, Music, Weather, …  295 data sets, 31 billion triples, 5.4 hundreds links1 4 1: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
  • 5. It's necessary to apply analytical operation to numerical and statistical data that are published as a Linked Data. 5
  • 6. OLAP(On-Line Analytical Processing)  An analytical method for huge accumulated data  It can answer to complex and statistical queries Category OS Mac Win PC Laptop  Multi-dimensional model Desktop Aomori  Data cube 32 686 128 East Sendai 8 2  Numerical values (facts) 100 64 Tokyo 386  Axis for analysis (dimensions) Place 686 386 8 32  hierarchical structure Osaka 128 2 West Hiro- 4 16 shima 386 16 64 Fuku- 8 oka 64 32 686 Q1 Q2 128 Q3 f-half Q4 l-half Time 6 図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html
  • 7. OLAP with Relational Database Use data that is stored in Relational Database Star Schema Category OS Mac Win  Numerical values (fact table) PC Laptop Desktop  Axis for analysis (dimension table) Aomori 32 686 Dim. 128 Category 東 Sendai 8 2 ID 100 64 Tokyo 386 large category Place 686 386 8 32 small category Osaka 128 2 西 Hiro- 4 16 Fact shima 386 Result 16 64 Dim. Dim. Fuku- 8 TIme ID Place oka 64 ID category_id ID 32 686 Q1 Q2 128 half time_id NS Q3 f-half Q4 quarter 店舗_id city name l-half sales volume Time 図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html 7
  • 8. Objective and problems Objective  Propose a method applying OLAP to typical numerical or statistical data that‟s published as Linked Data Problems  It‟s difficult to analyze RDF on OLAP directly  Need to convert graph data for analysis on existing OLAP system  Prepare axes and hierarchies for analysis  OLAP needs axis and hierarchies for analysis 8
  • 9. Related work Benedikt et al1  Proposed a conversion method of Linked Data for OLAP  Uses RDF Data Cube (QB) vocabulary  A RDF vocabulary for describing a Data Cube in RDF  Convert Linked Data that follows QB Voc. to RDB  And hierarchical structures are written in RDF  However, there were only a few Linked Data sets that follows QB Voc. 1. B. Kampgen, and A. Harth. Transforming Statistical Linked Data for Use in OLAP Systems, I-SEMANTICS 2011, 7th Int. Conf. on Semantic Systems, 2011. 9
  • 10. In this research, we propose a method to apply OLAP to TYPICAL Linked Data 10
  • 11. Approaches Mapping RDF (graph structure) to relational schema Create hierarchical structure from data and data links, semi-automatically (using features of RDF, Linked Data) 11
  • 12. From retrieval of data to schema creation 1. Retrieval of RDF Data 2. Store RDF Data to RDB 3. Selection of analysis target 4. Create dimension table 5. Create schema for OLAP 12
  • 13. 1. RDFデータの取得 1. Retrieval of RDF Data 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選択 4. 次元表の作成 Retrieve RDF Data for analysis 5. スキーマの導出  Use RDF dump  Refer with URI  Retrieve resources in the same host recursively  Retrieve outside resources are used as Object too Resources for analysis(www.example.com) Outside resources(www.w3.or 13
  • 14. 1. RDFデータの取得 2. Store RDF Data to RDB (1/2) 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選 4. 次元表の作成 Store triples to RDB in each “rdf:type” 5. スキーマの導出  We can‟t decide table schema  RDF must not use schemas  store as vertical representation 1. DJ Abadiら,2007. Tsukuba computer “190,000” category volum store result_1 time_1 “Result” table(vertical) time subject predicate object type result_1 time time_1 resourc result_2 e time result_1 category computer resourc time_2 e result_1 volume “190,000” literal volume rdf:type result_1 store Tsukuba resourc “4,000” e Result Store Mito result_2 time time_2 resourc 14 e
  • 15. 1. RDFデータの取得 2. Store RDF Data to RDB (2/2) 2. RDFを関係データベース 3. ユーザーによる分析対 4. 次元表の作成 Convert vertical table to horizontal one 5. スキーマの導出  we can know what attributes or properties there are “Sales result” table(vertical) subject predicate object type Sales result result_1 time time_1 resourc subject[PK] e time[FK] result_1 category computer resourc category[FK] e volume result_1 volume “190,000” literal store[FK] result_1 store Tsukuba resourc got schema e result_2 time time_2 resourc PK: primary key e FK: foreign key result_2 “Sales volume “4,000” volume” table(horizontal) literal result_2 subject[PK] store time[FK] Mito category[FK] resourc volume store[FK] e result_1 time_1 computer “190,000” Tsukuba result_2 time_2 null “4,000” Mito 15
  • 16. 1. RDFデータの取得 3. Selection of analysis target 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選 4. 次元表の作成 User chooses which value will be 5. スキーマの導出 targeted in the analytical operation Stock Category Sales volume subject subject subject product_name name Time category_ID category_ID subject time_ID store_ID date store_ID Store stock numbers time volume subject location_ID gn:location subject Visitor name subject store_ID time_ID visitor counter 16
  • 17. 1. RDFデータの取得 4. Create dimension table 2. RDFを関係データベースへ格納 3. ユーザーによる分析対象の選 4. 次元表の作成 Create dimension table for OLAP 5. スキーマの導出 1  We propose 3 creation methods 2 Use literal that is written in data directly Use layer structure that is written in data directly Fact category Sales volume subject subject Time name 3 category_ID subject time_ID Use layer structure date from outside data store_ID time volume store gn:location subject subject location_ID name Blue: outside resources 17
  • 18. 3 Use layer structure from outside data Use other datasets that can use layer structure outside datasets (resources) target dataset Tsukuba Japan Mito Ibaraki layer structure e.g.)GeoNames  Can use geologically layered structure  Tsukuba / Ibaraki / Japan / Asia 18
  • 19. 1. RDFデータの取得 5. Create schema for OLAP 2. RDFを関係データベースへ 3. ユーザーによる分析対象 4. 概念階層の作成 e.g.) A case “sales volume” was selected as 5. スキーマの導出 a measure  fact-table)Results  dim.-table)Time,Category,Store-gn:place Dim. Category Dim. Time Fact subject Results subject name subject hour category_subject Store-gn:place day time_subject subject month store_subject L1(district) quater vlume L2(prefecture) year L3(country) L4(continent) Dim. 19
  • 20. Experiments Objective  Convert data and create a schema for OLAP from numerical/statistical data that is published as Linked Data Exp. 1) Radiation observation data  National Radioactivity Stat as Linked Data1  A dataset from 環境放射能水準調査 by 文部科学省  文部科学省発行の環境放射能水準調査2をRDF化したデータセット Exp. 2) Weather observation data  Linked Sensor/Observation Data3  Observatory meta data from more than 200,000 places.  Hurricane observation data from these observatories 1 http://www.kanzaki.com/works/2011/stat/ra/ 2 http://radioactivity.mext.go.jp/ja/monitoring_by_prefecture/ 20 3 http://wiki.knoesis.org/index.php/LinkedSensorData
  • 21. Exp. 1)Results(1/2) Got radiation observed data by crawling  Num. of Triples: 1,003,410 (March-Dec. in 2011)  Geo. info.: Use dumped data from GeoNames Candidates of measure  Observation instance, Time, Location Obs. Instance Value rdf:value ra:20110315/p02/t20 “0.040” ev:place ev:time Location gn:2111833 Time tl:at time:20110315T22PT1H “2011-04-14T00:00:00”^^xsd:dateTime “ra” は “http://www.kanzaki.com/works/2011/stat/ra/”,”time” は “http://www.kanzaki.com/works/2011/stat/dim/d/” “gn” は “http://sws.geonames.org/”,”ev” は “http://purl.org/NET/c4dm/event.owl#” 21 ”tl” は “http://purl.org/NET/c4dm/timeline.owl#”,”rdf” は “http://www.w3.org/1999/02/22-rdf-syntax-ns#”
  • 22. Exp. 1)Results(2/2)  Created schema when “value” was used as a measure  fact-table)integrated table that contains obs. value  dim.-table) 1. Time(created hierarchy structure from obs. time) 2. Location(got hierarchy structure from GeoNames) Dim. Fact Dim. Time observation_instance Location subject subject subject sec place_subject layer_1(district) min time_subject layer_2(prefecture) hour value layer_3(country) day layer_4(continent) month  (observed) value was used as string type year  It was needed to have the data type completed by the user 22
  • 23. Exp. 2)Results(1/2) Used a dumped data  Hurricane Bill (17-22nd/Sept./2009)  Num. of triples: 231,021,108 / count of obs.: 21,272,790  Geo info.: used a dumped data of GeoNames Candidates of measures  time, value, coordinate (lat., long.) Created a schema when “Value” was used as a measure  fact table: integrated table(obs. instance and value)  dimension table: 1. observed time (hierarchy structure got from time) 2. observatory (hierarchy structure got from GeoNames) 23
  • 24. “-121.6736"^^xsd:float “45.0397”^^xsd:float gn:Feature wgs48:lat wgs48:long “3780"^^xsd:float om-owl:hasLocation wgs48:alt wgs84:Point om-owl:LocatedNearRel om-owl:processLocation om-owl:hasLocatedNearRel om-owl:System Observatory om-owl:procedure Linked Sensor Data Linked Observation Data om-owl:generatedObservation om-owl:result Observed value om-owl:MeasureData instantOfObservation om-owl:floatValue Observation instance “81.3” ^^xsd:float om-owl:samplingTime time:inXSDDateTime “2004-08-10T16:10:00-06:00” time:Instant ^^xsd:dateTime Observation time “wgs84” は “http://www.w3.org/2003/01/geo/wgs84_pos#” の略 “time” は “http://www.w3.org/2006/time#” の略 “gn” は “http://www.geonames.org/ontology#” の略 “om-owl” は “http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#” の略 24 “xsd” は “http://www.w3.org/2001/XMLSchema#” の略
  • 25. Exp. 2)Results(2/2) Created schema Dim. System_LocatedNearRel Dim. (観測所-周辺-その他) Fact Instant (Time) Observation_MeasureData subject subject subject ID(Name) sec procedure (system_subject) Source URI min samplingTime (Instant_subject) wgs84:alt hour wgs84:lat floatValue day wgs84:long month layer_1(town) year layer_2(district) layer_3(prefecture) layer_4(country) Problem layer_5(continent)  disappeared ontology (couldn‟t access)  “weather:” ontology* * http://knoesis.wright.edu/ssw/ont/weather.owl# 25
  • 26. Conclusion Proposed a method to apply OLAP to typical Linked Data (numerical and statistical data)  processing with features of RDF and Linked Data  get layered structure from inside and outside of the data  and created hierarchy (dimension-table) for OLAP Applied method to two observation data  convert data and prepare some axes for analysis when target is chose  create star schema for OLAP 26
  • 27. Future work  Revise method  The triple has many objects are described in the same predicate case  Subject has many “rdf:type”s case  Subject has no “rdf:type” case  In order to analyze more datasets  Handling a lot of ontologies  Provide a mechanism to use outside resources  Definition of layered structure are different each  Apply and verify a lot of kinds of data in other regions 27
  • 28. Thank you for your attention! 28