SlideShare a Scribd company logo
1 of 26
Download to read offline
Binary RDF for Scalable Publishing,
   Exchanging and Consumption
        in the Web of Data

Javier D. Fernández
Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez

                                        University of Valladolid (Spain)
                                            University of Chile (Chile)




PhD Symposium
Brief RDF Introduction

(1) Resource Description Framework
     Webs, services, protocols
     Persons, Proteins, geography…


(2) A standard model for data exchange on the Web
    Understandable by computers


(3) W3C Recommendation (2004)

(4) Data model
    (subject, predicate, object)


   PhD Symposium
RDF Example
                                                                                           literal
Subject, Predicate, Object
(U,B) , U        , (U,B,L)
                                                                                   “Pablo Neruda”
                                                                URI
               URI                         URI



                                                                 <http://books/author33>

    <http://books/book21>

                                                                 “Spain in the Heart”




                 _collection                      <http://myblog/lectures>

                               lectures:to_read_list

       Blank

 PhD Symposium
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover more things.




     Image:PhD Symposium
           Danilo Rizzuti / FreeDigitalPhotos.net
Image:PhD Symposium
      Danilo Rizzuti / FreeDigitalPhotos.net
Scalability problems



                    DBPedia (en)   233 M.triples   ~ 33 GB
                    Uniprot        845     “       ~ 230 GB



   Publish?
   Exchange?
   Process/Consume/Query?



    PhD Symposium
RDF Publication



                                                   dereferenceable URIs

                                     RDF dump
                  sensor
                                                SPARQL Endpoints/
                                                      APIs


 No Recommendations/methodology to publish at large scale
 Related Work: Some metadata for discovery, such as Void, Semantic
  Sitemaps.




  PhD Symposium
RDF Exchanging issues
 RDF/XML, N3, Turtle, JSON.
          Document-centric (verbose)  data-centric view (machine)
 No structure (chunks, universal compression)



 Related Work: Universal compression (gzip, bzip2) and the Efficient XML
  Interchange Format (EXI).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
RDF Processing/Consumption (After Exchanging)
 Costly Post-processing
          Decompression
          Indexing (RDF Store)
          Finally… consume


 Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes
  (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
The scalability problems has
a main impact on Users

         Would you download hundreds of GB...


                                              … if you don’t know exactly what they contain,
                                             that need costly exchange and post-processing,
                                                and require a powerful store to query them ?




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
In the following...
1. Proposed approach for scalable publishing, exchanging and consumption
   of large RDF datasets
2. Preliminary results
3. Methodology
4. On-going work and conclusions




   Image:PhD Symposium
         jscreationzs / FreeDigitalPhotos.net
An integrated solution
We call for, and we study in this thesis, a Binary RDF Serialization format:
     Machine oriented (binary)
     Clean publication
               Metadata
               Modular
     Efficient exchange
               Compression
     Basic data operations
               Easy to parse and consume
               Primitive query resolution




    Image:PhD Symposium
          jscreationzs / FreeDigitalPhotos.net
HDT Overview




 PhD Symposium
Dictionary+Triples partition



   1   <http://books/author33>
   2   <http://books/book21>             6
   3   dc:author
   4   dc:title
   5   foaf:name                     1
                                 2
   6   “Pablo Neruda”
   7   “Spain in the Heart”          7




  PhD Symposium
Key concepts: The Dictionary

   Largest component (up to 74%)
     Long URIs, shared prefixes
     Lang, datatype tags in literals
   Efficient IDString operations



We plan to work on a specific organization which
  Optimizes space (regularities)
  Provides efficient performance in operations




         PhD Symposium
Preliminary results in Rich Functional Dictionaries

We propose to adapt techniques for string dictionaries;
  Front-Coding
     Making dictionary partitions




  [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández,
     Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012).

       PhD Symposium
Key concepts: Triples

   Specific compression:
       More efficient compression than just gzip.
   Data indexing for consumption:
       Allows direct patterns resolution without decompression
           (s,p,o), (s,?p,?o) and (s,p,?o)


We plan to work on a specific technique which
  optimizes space
  provides efficient performance in primitive operations




          PhD Symposium
Preliminary results in Triples Encoding

We propose to use Bitmap indexes:




   [*] Compact Representation of Large RDF Data Sets for Publishing and
       Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez.
       International Semantic Web Conference(ISWC 2010).

       PhD Symposium
Methodology
 RDF structure in theory and practice.
 Binary RDF Specification.
 Succinct Dictionaries.
 Triples Indexes.
 Practical deployment.




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
Some Results… HDT Acknowledged as W3C
member submission:
http://www.w3.org/Submission/2011/03/
                                        supported by:




   PhD Symposium
Some Results... HDT for exchanging




 PhD Symposium
Some Results... HDT for consumption
Direct Consumption, without decompression after exchanging
           Example of use: HDT-it (Thanks to Mario Arias, DERI)




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
On-going promising work: HDT-FoQ




    [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto,
        Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC
        2012). To appear
 PhD Symposium
In conclusion
Binary RDF aims to lightweight the Web of Data;
    Logical decomposition: Header, Dictionary, and Triples
    Clean publication
    Compressed RDF format for exchanging
    Machine-friendly, direct consumption
         Rich Functional Dictionary/Triples representations for querying




      PhD Symposium
Still much work on…
 Getting a global understanding of the real structure of RDF networks.
 Applying this knowledge in innovative dictionary and triples indexes.
     full SPARQL at consumption
 Supporting dynamic operations
     inserting, deleting, and updating binary RDF




       PhD Symposium
Thanks!



HDT:        http://www.rdfhdt.org/
Group: http://dataweb.infor.uva.es/
Slides: http://www.slideshare.net/javifer


                                   Javier D. Fernández (jfergar@infor.uva.es)
                   Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez

                                                   University of Valladolid (Spain)
                                                       University of Chile (Chile)


  PhD Symposium

More Related Content

What's hot

Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
Road to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache ArrowRoad to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache ArrowNeo4j
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Service Workers and APEX
Service Workers and APEXService Workers and APEX
Service Workers and APEXDimitri Gielis
 
RDBでのツリー表現入門
RDBでのツリー表現入門RDBでのツリー表現入門
RDBでのツリー表現入門Kent Ohashi
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight OverviewJacques Nadeau
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Databricks
 
Introduction to gRPC
Introduction to gRPCIntroduction to gRPC
Introduction to gRPCPrakash Divy
 
Module: Content Exchange in IPFS
Module: Content Exchange in IPFSModule: Content Exchange in IPFS
Module: Content Exchange in IPFSIoannis Psaras
 
Named Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseNamed Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseRute C. Sofia
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedis Labs
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceNeo4j
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - ImportNeo4j
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDatabricks
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Neo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...VMware Tanzu
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...InfluxData
 
The Hidden Life of Spark Jobs
The Hidden Life of Spark JobsThe Hidden Life of Spark Jobs
The Hidden Life of Spark JobsDataWorks Summit
 

What's hot (20)

Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Road to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache ArrowRoad to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache Arrow
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Service Workers and APEX
Service Workers and APEXService Workers and APEX
Service Workers and APEX
 
RDBでのツリー表現入門
RDBでのツリー表現入門RDBでのツリー表現入門
RDBでのツリー表現入門
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
Introduction to gRPC
Introduction to gRPCIntroduction to gRPC
Introduction to gRPC
 
Module: Content Exchange in IPFS
Module: Content Exchange in IPFSModule: Content Exchange in IPFS
Module: Content Exchange in IPFS
 
Named Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseNamed Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-case
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files Syndrome
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Neo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j: What's Under the Hood
Neo4j: What's Under the Hood
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
The Hidden Life of Spark Jobs
The Hidden Life of Spark JobsThe Hidden Life of Spark Jobs
The Hidden Life of Spark Jobs
 

Viewers also liked

Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyGale Pooley
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityNorman Garcia
 
B. indonesia
B. indonesiaB. indonesia
B. indonesiaJay
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьуŞamil Tzva
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!Timo Savolainen
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discriminationGale Pooley
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsGale Pooley
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationJoanan Hernandez
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemGale Pooley
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbcHatti Knuts
 

Viewers also liked (20)

Exel budget
Exel budgetExel budget
Exel budget
 
The pitch[1]
The pitch[1]The pitch[1]
The pitch[1]
 
Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economy
 
Proposal salam bgi
Proposal salam bgiProposal salam bgi
Proposal salam bgi
 
TGV Pequim-Xangai
TGV Pequim-XangaiTGV Pequim-Xangai
TGV Pequim-Xangai
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon City
 
A good story
A good storyA good story
A good story
 
B. indonesia
B. indonesiaB. indonesia
B. indonesia
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьу
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costs
 
London
LondonLondon
London
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentation
 
The pitch
The pitchThe pitch
The pitch
 
CONCURSO BOLETÍN
CONCURSO BOLETÍNCONCURSO BOLETÍN
CONCURSO BOLETÍN
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial system
 
O drama na Síria ...
O drama na Síria ...O drama na Síria ...
O drama na Síria ...
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbc
 
202 lecture 1
202 lecture 1202 lecture 1
202 lecture 1
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." Avalon Media System
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data (20)

Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 

Recently uploaded

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

  • 1. Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data Javier D. Fernández Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium
  • 2. Brief RDF Introduction (1) Resource Description Framework  Webs, services, protocols  Persons, Proteins, geography… (2) A standard model for data exchange on the Web  Understandable by computers (3) W3C Recommendation (2004) (4) Data model  (subject, predicate, object) PhD Symposium
  • 3. RDF Example literal Subject, Predicate, Object (U,B) , U , (U,B,L) “Pablo Neruda” URI URI URI <http://books/author33> <http://books/book21> “Spain in the Heart” _collection <http://myblog/lectures> lectures:to_read_list Blank PhD Symposium
  • 4. 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 5. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 6. Scalability problems DBPedia (en) 233 M.triples ~ 33 GB Uniprot 845 “ ~ 230 GB  Publish?  Exchange?  Process/Consume/Query? PhD Symposium
  • 7. RDF Publication dereferenceable URIs RDF dump sensor SPARQL Endpoints/ APIs  No Recommendations/methodology to publish at large scale  Related Work: Some metadata for discovery, such as Void, Semantic Sitemaps. PhD Symposium
  • 8. RDF Exchanging issues  RDF/XML, N3, Turtle, JSON.  Document-centric (verbose)  data-centric view (machine)  No structure (chunks, universal compression)  Related Work: Universal compression (gzip, bzip2) and the Efficient XML Interchange Format (EXI). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 9. RDF Processing/Consumption (After Exchanging)  Costly Post-processing  Decompression  Indexing (RDF Store)  Finally… consume  Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 10. The scalability problems has a main impact on Users Would you download hundreds of GB... … if you don’t know exactly what they contain, that need costly exchange and post-processing, and require a powerful store to query them ? Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 11. In the following... 1. Proposed approach for scalable publishing, exchanging and consumption of large RDF datasets 2. Preliminary results 3. Methodology 4. On-going work and conclusions Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 12. An integrated solution We call for, and we study in this thesis, a Binary RDF Serialization format:  Machine oriented (binary)  Clean publication  Metadata  Modular  Efficient exchange  Compression  Basic data operations  Easy to parse and consume  Primitive query resolution Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 13. HDT Overview PhD Symposium
  • 14. Dictionary+Triples partition 1 <http://books/author33> 2 <http://books/book21> 6 3 dc:author 4 dc:title 5 foaf:name 1 2 6 “Pablo Neruda” 7 “Spain in the Heart” 7 PhD Symposium
  • 15. Key concepts: The Dictionary  Largest component (up to 74%)  Long URIs, shared prefixes  Lang, datatype tags in literals  Efficient IDString operations We plan to work on a specific organization which  Optimizes space (regularities)  Provides efficient performance in operations PhD Symposium
  • 16. Preliminary results in Rich Functional Dictionaries We propose to adapt techniques for string dictionaries;  Front-Coding  Making dictionary partitions [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012). PhD Symposium
  • 17. Key concepts: Triples  Specific compression:  More efficient compression than just gzip.  Data indexing for consumption:  Allows direct patterns resolution without decompression (s,p,o), (s,?p,?o) and (s,p,?o) We plan to work on a specific technique which  optimizes space  provides efficient performance in primitive operations PhD Symposium
  • 18. Preliminary results in Triples Encoding We propose to use Bitmap indexes: [*] Compact Representation of Large RDF Data Sets for Publishing and Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez. International Semantic Web Conference(ISWC 2010). PhD Symposium
  • 19. Methodology  RDF structure in theory and practice.  Binary RDF Specification.  Succinct Dictionaries.  Triples Indexes.  Practical deployment. Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 20. Some Results… HDT Acknowledged as W3C member submission: http://www.w3.org/Submission/2011/03/ supported by: PhD Symposium
  • 21. Some Results... HDT for exchanging PhD Symposium
  • 22. Some Results... HDT for consumption Direct Consumption, without decompression after exchanging  Example of use: HDT-it (Thanks to Mario Arias, DERI) Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 23. On-going promising work: HDT-FoQ [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto, Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC 2012). To appear PhD Symposium
  • 24. In conclusion Binary RDF aims to lightweight the Web of Data;  Logical decomposition: Header, Dictionary, and Triples  Clean publication  Compressed RDF format for exchanging  Machine-friendly, direct consumption  Rich Functional Dictionary/Triples representations for querying PhD Symposium
  • 25. Still much work on…  Getting a global understanding of the real structure of RDF networks.  Applying this knowledge in innovative dictionary and triples indexes.  full SPARQL at consumption  Supporting dynamic operations  inserting, deleting, and updating binary RDF PhD Symposium
  • 26. Thanks! HDT: http://www.rdfhdt.org/ Group: http://dataweb.infor.uva.es/ Slides: http://www.slideshare.net/javifer Javier D. Fernández (jfergar@infor.uva.es) Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium