SlideShare a Scribd company logo
1 of 26
Download to read offline
Binary RDF for Scalable Publishing,
   Exchanging and Consumption
        in the Web of Data

Javier D. Fernández
Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez

                                        University of Valladolid (Spain)
                                            University of Chile (Chile)




PhD Symposium
Brief RDF Introduction

(1) Resource Description Framework
     Webs, services, protocols
     Persons, Proteins, geography…


(2) A standard model for data exchange on the Web
    Understandable by computers


(3) W3C Recommendation (2004)

(4) Data model
    (subject, predicate, object)


   PhD Symposium
RDF Example
                                                                                           literal
Subject, Predicate, Object
(U,B) , U        , (U,B,L)
                                                                                   “Pablo Neruda”
                                                                URI
               URI                         URI



                                                                 <http://books/author33>

    <http://books/book21>

                                                                 “Spain in the Heart”




                 _collection                      <http://myblog/lectures>

                               lectures:to_read_list

       Blank

 PhD Symposium
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover more things.




     Image:PhD Symposium
           Danilo Rizzuti / FreeDigitalPhotos.net
Image:PhD Symposium
      Danilo Rizzuti / FreeDigitalPhotos.net
Scalability problems



                    DBPedia (en)   233 M.triples   ~ 33 GB
                    Uniprot        845     “       ~ 230 GB



   Publish?
   Exchange?
   Process/Consume/Query?



    PhD Symposium
RDF Publication



                                                   dereferenceable URIs

                                     RDF dump
                  sensor
                                                SPARQL Endpoints/
                                                      APIs


 No Recommendations/methodology to publish at large scale
 Related Work: Some metadata for discovery, such as Void, Semantic
  Sitemaps.




  PhD Symposium
RDF Exchanging issues
 RDF/XML, N3, Turtle, JSON.
          Document-centric (verbose)  data-centric view (machine)
 No structure (chunks, universal compression)



 Related Work: Universal compression (gzip, bzip2) and the Efficient XML
  Interchange Format (EXI).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
RDF Processing/Consumption (After Exchanging)
 Costly Post-processing
          Decompression
          Indexing (RDF Store)
          Finally… consume


 Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes
  (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
The scalability problems has
a main impact on Users

         Would you download hundreds of GB...


                                              … if you don’t know exactly what they contain,
                                             that need costly exchange and post-processing,
                                                and require a powerful store to query them ?




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
In the following...
1. Proposed approach for scalable publishing, exchanging and consumption
   of large RDF datasets
2. Preliminary results
3. Methodology
4. On-going work and conclusions




   Image:PhD Symposium
         jscreationzs / FreeDigitalPhotos.net
An integrated solution
We call for, and we study in this thesis, a Binary RDF Serialization format:
     Machine oriented (binary)
     Clean publication
               Metadata
               Modular
     Efficient exchange
               Compression
     Basic data operations
               Easy to parse and consume
               Primitive query resolution




    Image:PhD Symposium
          jscreationzs / FreeDigitalPhotos.net
HDT Overview




 PhD Symposium
Dictionary+Triples partition



   1   <http://books/author33>
   2   <http://books/book21>             6
   3   dc:author
   4   dc:title
   5   foaf:name                     1
                                 2
   6   “Pablo Neruda”
   7   “Spain in the Heart”          7




  PhD Symposium
Key concepts: The Dictionary

   Largest component (up to 74%)
     Long URIs, shared prefixes
     Lang, datatype tags in literals
   Efficient IDString operations



We plan to work on a specific organization which
  Optimizes space (regularities)
  Provides efficient performance in operations




         PhD Symposium
Preliminary results in Rich Functional Dictionaries

We propose to adapt techniques for string dictionaries;
  Front-Coding
     Making dictionary partitions




  [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández,
     Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012).

       PhD Symposium
Key concepts: Triples

   Specific compression:
       More efficient compression than just gzip.
   Data indexing for consumption:
       Allows direct patterns resolution without decompression
           (s,p,o), (s,?p,?o) and (s,p,?o)


We plan to work on a specific technique which
  optimizes space
  provides efficient performance in primitive operations




          PhD Symposium
Preliminary results in Triples Encoding

We propose to use Bitmap indexes:




   [*] Compact Representation of Large RDF Data Sets for Publishing and
       Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez.
       International Semantic Web Conference(ISWC 2010).

       PhD Symposium
Methodology
 RDF structure in theory and practice.
 Binary RDF Specification.
 Succinct Dictionaries.
 Triples Indexes.
 Practical deployment.




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
Some Results… HDT Acknowledged as W3C
member submission:
http://www.w3.org/Submission/2011/03/
                                        supported by:




   PhD Symposium
Some Results... HDT for exchanging




 PhD Symposium
Some Results... HDT for consumption
Direct Consumption, without decompression after exchanging
           Example of use: HDT-it (Thanks to Mario Arias, DERI)




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
On-going promising work: HDT-FoQ




    [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto,
        Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC
        2012). To appear
 PhD Symposium
In conclusion
Binary RDF aims to lightweight the Web of Data;
    Logical decomposition: Header, Dictionary, and Triples
    Clean publication
    Compressed RDF format for exchanging
    Machine-friendly, direct consumption
         Rich Functional Dictionary/Triples representations for querying




      PhD Symposium
Still much work on…
 Getting a global understanding of the real structure of RDF networks.
 Applying this knowledge in innovative dictionary and triples indexes.
     full SPARQL at consumption
 Supporting dynamic operations
     inserting, deleting, and updating binary RDF




       PhD Symposium
Thanks!



HDT:        http://www.rdfhdt.org/
Group: http://dataweb.infor.uva.es/
Slides: http://www.slideshare.net/javifer


                                   Javier D. Fernández (jfergar@infor.uva.es)
                   Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez

                                                   University of Valladolid (Spain)
                                                       University of Chile (Chile)


  PhD Symposium

More Related Content

What's hot

アプリケーション開発者のためのAzure Databricks入門
アプリケーション開発者のためのAzure Databricks入門アプリケーション開発者のためのAzure Databricks入門
アプリケーション開発者のためのAzure Databricks入門Yoichi Kawasaki
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)Myungjin Lee
 
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法 ※講演は翻訳資料にて行います。 - Getting the Best...
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法  ※講演は翻訳資料にて行います。 - Getting the Best...PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法  ※講演は翻訳資料にて行います。 - Getting the Best...
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法 ※講演は翻訳資料にて行います。 - Getting the Best...Holden Karau
 
HDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wHDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wCloudera Japan
 
Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Ken SASAKI
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web CorpusRobert Meusel
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j FundamentalsMax De Marzi
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery GirdhareeSaran
 
Resource description framework
Resource description frameworkResource description framework
Resource description frameworkStanley Wang
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Neo4j
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)EUCLID project
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to GraphNeo4j
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門Satoru Ishikawa
 

What's hot (20)

アプリケーション開発者のためのAzure Databricks入門
アプリケーション開発者のためのAzure Databricks入門アプリケーション開発者のためのAzure Databricks入門
アプリケーション開発者のためのAzure Databricks入門
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法 ※講演は翻訳資料にて行います。 - Getting the Best...
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法  ※講演は翻訳資料にて行います。 - Getting the Best...PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法  ※講演は翻訳資料にて行います。 - Getting the Best...
PySparkによるジョブを、より速く、よりスケーラブルに実行するための最善の方法 ※講演は翻訳資料にて行います。 - Getting the Best...
 
HDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wHDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13w
 
Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Hadoopの概念と基本的知識
Hadoopの概念と基本的知識
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門
 

Viewers also liked

Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyGale Pooley
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityNorman Garcia
 
B. indonesia
B. indonesiaB. indonesia
B. indonesiaJay
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьуŞamil Tzva
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!Timo Savolainen
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discriminationGale Pooley
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsGale Pooley
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationJoanan Hernandez
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemGale Pooley
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbcHatti Knuts
 

Viewers also liked (20)

Exel budget
Exel budgetExel budget
Exel budget
 
The pitch[1]
The pitch[1]The pitch[1]
The pitch[1]
 
Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economy
 
Proposal salam bgi
Proposal salam bgiProposal salam bgi
Proposal salam bgi
 
TGV Pequim-Xangai
TGV Pequim-XangaiTGV Pequim-Xangai
TGV Pequim-Xangai
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon City
 
A good story
A good storyA good story
A good story
 
B. indonesia
B. indonesiaB. indonesia
B. indonesia
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьу
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costs
 
London
LondonLondon
London
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentation
 
The pitch
The pitchThe pitch
The pitch
 
CONCURSO BOLETÍN
CONCURSO BOLETÍNCONCURSO BOLETÍN
CONCURSO BOLETÍN
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial system
 
O drama na Síria ...
O drama na Síria ...O drama na Síria ...
O drama na Síria ...
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbc
 
202 lecture 1
202 lecture 1202 lecture 1
202 lecture 1
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." Avalon Media System
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data (20)

Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

  • 1. Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data Javier D. Fernández Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium
  • 2. Brief RDF Introduction (1) Resource Description Framework  Webs, services, protocols  Persons, Proteins, geography… (2) A standard model for data exchange on the Web  Understandable by computers (3) W3C Recommendation (2004) (4) Data model  (subject, predicate, object) PhD Symposium
  • 3. RDF Example literal Subject, Predicate, Object (U,B) , U , (U,B,L) “Pablo Neruda” URI URI URI <http://books/author33> <http://books/book21> “Spain in the Heart” _collection <http://myblog/lectures> lectures:to_read_list Blank PhD Symposium
  • 4. 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 5. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 6. Scalability problems DBPedia (en) 233 M.triples ~ 33 GB Uniprot 845 “ ~ 230 GB  Publish?  Exchange?  Process/Consume/Query? PhD Symposium
  • 7. RDF Publication dereferenceable URIs RDF dump sensor SPARQL Endpoints/ APIs  No Recommendations/methodology to publish at large scale  Related Work: Some metadata for discovery, such as Void, Semantic Sitemaps. PhD Symposium
  • 8. RDF Exchanging issues  RDF/XML, N3, Turtle, JSON.  Document-centric (verbose)  data-centric view (machine)  No structure (chunks, universal compression)  Related Work: Universal compression (gzip, bzip2) and the Efficient XML Interchange Format (EXI). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 9. RDF Processing/Consumption (After Exchanging)  Costly Post-processing  Decompression  Indexing (RDF Store)  Finally… consume  Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 10. The scalability problems has a main impact on Users Would you download hundreds of GB... … if you don’t know exactly what they contain, that need costly exchange and post-processing, and require a powerful store to query them ? Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 11. In the following... 1. Proposed approach for scalable publishing, exchanging and consumption of large RDF datasets 2. Preliminary results 3. Methodology 4. On-going work and conclusions Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 12. An integrated solution We call for, and we study in this thesis, a Binary RDF Serialization format:  Machine oriented (binary)  Clean publication  Metadata  Modular  Efficient exchange  Compression  Basic data operations  Easy to parse and consume  Primitive query resolution Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 13. HDT Overview PhD Symposium
  • 14. Dictionary+Triples partition 1 <http://books/author33> 2 <http://books/book21> 6 3 dc:author 4 dc:title 5 foaf:name 1 2 6 “Pablo Neruda” 7 “Spain in the Heart” 7 PhD Symposium
  • 15. Key concepts: The Dictionary  Largest component (up to 74%)  Long URIs, shared prefixes  Lang, datatype tags in literals  Efficient IDString operations We plan to work on a specific organization which  Optimizes space (regularities)  Provides efficient performance in operations PhD Symposium
  • 16. Preliminary results in Rich Functional Dictionaries We propose to adapt techniques for string dictionaries;  Front-Coding  Making dictionary partitions [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012). PhD Symposium
  • 17. Key concepts: Triples  Specific compression:  More efficient compression than just gzip.  Data indexing for consumption:  Allows direct patterns resolution without decompression (s,p,o), (s,?p,?o) and (s,p,?o) We plan to work on a specific technique which  optimizes space  provides efficient performance in primitive operations PhD Symposium
  • 18. Preliminary results in Triples Encoding We propose to use Bitmap indexes: [*] Compact Representation of Large RDF Data Sets for Publishing and Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez. International Semantic Web Conference(ISWC 2010). PhD Symposium
  • 19. Methodology  RDF structure in theory and practice.  Binary RDF Specification.  Succinct Dictionaries.  Triples Indexes.  Practical deployment. Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 20. Some Results… HDT Acknowledged as W3C member submission: http://www.w3.org/Submission/2011/03/ supported by: PhD Symposium
  • 21. Some Results... HDT for exchanging PhD Symposium
  • 22. Some Results... HDT for consumption Direct Consumption, without decompression after exchanging  Example of use: HDT-it (Thanks to Mario Arias, DERI) Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 23. On-going promising work: HDT-FoQ [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto, Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC 2012). To appear PhD Symposium
  • 24. In conclusion Binary RDF aims to lightweight the Web of Data;  Logical decomposition: Header, Dictionary, and Triples  Clean publication  Compressed RDF format for exchanging  Machine-friendly, direct consumption  Rich Functional Dictionary/Triples representations for querying PhD Symposium
  • 25. Still much work on…  Getting a global understanding of the real structure of RDF networks.  Applying this knowledge in innovative dictionary and triples indexes.  full SPARQL at consumption  Supporting dynamic operations  inserting, deleting, and updating binary RDF PhD Symposium
  • 26. Thanks! HDT: http://www.rdfhdt.org/ Group: http://dataweb.infor.uva.es/ Slides: http://www.slideshare.net/javifer Javier D. Fernández (jfergar@infor.uva.es) Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium