SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Industrialized Linked Data

     Dave Reynolds, Epimorphics Ltd
                            @der42
Context: public sector Linked Data
Linked Data journey ...

    explore
   what is linked data?
   what use it is for us?
Linked Data journey ...

    explore
   what is linked data?
   what use it is for us?

                      self-describing                Integration
                      carries semantics with it      comparable
                      annotate and explain           slice and dice
                      data in context                web API
                      ...                            ...
Linked Data journey ...

    explore
   what is linked data?
   what use it is for us?

                      self-describing                Integration
                      carries semantics with it      comparable
                      annotate and explain           slice and dice
                      data in context                web API
                      ...                            ...
   what’s involved?
Linked Data journey ...

      explore                                        pilot


     data                     model                convert   publish   apply




Photo of The Thinker © dSeneste.dk@flicker CC BY
Linked Data journey ...

  explore                 pilot              routine?
Great pilot but ...
 can we reduce the time and cost?
 how do we handle changes and updates?
 how can we make the published data easier to use?


How do we make Linked Data “business as usual”?
Example case study: Environment Agency
   monitoring of bathing
    water quality
   static pilot
   live pilot
       historic annual
        assessments
       weekly assessments
   operational system
       additional data feeds
       live update
       integrated API
       data explorer
From pilot to practice
   reduce modelling costs
       patterns                  dive1
       reuse
   handling change and update
       patterns
       publication process
   automation
       conversion
       publication
   embed in the business process
       use internally as well as externally
       publish once, use many
       data platform
Reduce costs - modelling
1. Don’t do it
     map source data into isomorphic RDF, synthesize URIs
     loses some of the value proposition
2. Reuse existing ontologies intact or mix-and-match
     best solution when available
     W3C GLD work on vocabularies – people, organizations,
      datasets ...
3. Reusable vocabulary patterns
     example:
         Data cube plus reference URI sets
         adaptable to broad range of data – environmental, statistical,
          financial ...
Reusable patterns: Data cube
   Much public sector data has regularities
       set of measures
            observations, forecasts, budgets, assessments, statistics ...




                    >0.1                   34


                           27               good
        excellent
                                                                     poor
                            good                   125
Reusable patterns: Data cube
   Much public sector data has regularities
       sets of measures
           observations, forecasts, budgets, assessments, estimates ...
       organized along some dimensions
           region, agency, time, category, cost centre ...




              objective code             cost centre


                               12   15           25
measure: spend
                               8     9           11
                           120      130         180
                                                          time
Reusable patterns: Data cube
   Much public sector data has regularities
       sets of measures
           observations, forecasts, budgets, assessments, estimates ...
       organized along some dimensions
           region, agency, time, category, cost centre ...
       interpreted according to attributes
           units, multipliers, status

              objective code             cost centre

                                                              provisional
                           $12k      $15k      $25k
measure: spend
                               $8k    $9k      $11k
                                                                 final
                          $120k      $130k     $180k
                                                          time
Data cube vocabulary
Data cube pattern
   Pattern, not a fixed ontology
       customize by selecting measures, dimensions and attributes
       originated in publishing of statistics
       applied to environment measurements, weather forecasts, budgets
        and spend, quality assessments, regional demographics ...
   Supports reuse
       widely reusable URI sets – geography, time periods, agencies, units
       organization-wide sets
       modelling often only requires small increments on top of core
        pattern and reusable components
   opens door for reusable visualization tools
   standardization through W3C GLD
Application to case study
   Data Cubes for water quality measurement
       in-season weekly assessments
       end of season annual assessments
   dimensions:
       time intervals – UK reference time service
       location - reference URI set for bathing waters and sample pts
   cubes can reuse these dimensions
       just need to define specific measures
From pilot to practice
   reduce modelling costs
       patterns
       reuse
   handling change and update
       patterns                               dive 2
       publication process
   automation
       conversion
       publication
   embed in the business process
       use internally as well as externally
       publish once, use many
       data platform
Handling change
   critical challenge
       most initial pilots choose a snapshot dataset
           and go stale, fast
       understanding the nature of data updates and how to handle
        them is critical to successful scaling to business as usual
   types of change
       new data related to different time period
       corrections to data
       entities change
           properties
           identity
Modelling change
1. Individual data items relate to new time period
Pattern: n-ary relation
        observation resource relates value to time period and other context
        use Data Cube dimensions for this
                                                  bwq:sampleYear
                               bwq:bathingWater                        http://reference.data.gov.uk/id/year/2009
http://environment.data.gov.
        uk/id/bathing-                            bwq:classification    Higher
    water/ukk1202-36000
                                                  bwq:sampleYear
    Clevedon Beach                                                     http://reference.data.gov.uk/id/year/2010
                                                  bwq:classification
                                                                       Minimum

                                                  bwq:sampleYear
                                                                       http://reference.data.gov.uk/id/year/2011

                                                  bwq:classification
                                                                        Higher

History or latest?
        latest is non-monotonic but helpful for many practical uses
             materialize (SPARQL Update), implement in query, implement in API
        choice whether to keep history as well
             water quality v. weather forecasts
Modelling change
2. Corrections
   patterns
        silent change (!)
        explicit replacement
             API level hides replaced values but SPARQL query can retrieve & trace
        explicit change event

                                                                               bwq:sampleYear
http://environment.data.gov.   bwq:bathingWater
                                                   classification : Higher                      http://reference.data.gov.uk/id/year/2011
        uk/id/bathing-
    water/ukk1202-36000
                                                                dct:isReplacedBy          ev:after
    Clevedon Beach                            dct:replaces
                                                                                                                        ev:occuredOn
                                                  classification : Minimum
                                                      status: replaced
                                                                                                     analysis event
                                                     reason: reanalysis
                                                                             ev:before                                      ev:agent
Modelling change
3. Mutation
   Infrequent change of properties, essential identity remains
     e.g. renaming a school, adding another building
     routine accesses see property value, not function of time
   patterns
     in place update
     named graphs
           current graph + graphs for each previous state + meta-graph
       explicit versioning with open periods
Modelling change
3. Mutation
explicit versioning with open periods
                       dct:hasVersion                   dct:hasVersion
                                             endurant




                “Clevedon Beach”                            “Clevedon Sands”

                           time:intervalStarts                        time:intervalStarts
               dct:valid                         2003     dct:valid                         2011

                                                 2011
                           time:intervalFinishes



     find right version by query on validity interval
     simplify use through
         non-monotonic “latest value” link
         API to implement query filters automatically
Application to case study
   weekly and annual samples
       use Data Cube pattern (n-ary relation)
   withdrawn samples
       replacement pattern (no explicit change event)
       Data Cube slice for “latest valid assessment”
           generated by a SPARQL Update query
       API gives easy access to the latest valid values
       linked data following or raw SPARQL query allows drilling into changes
   changes to bathing water profile
       versioning pattern
       bathing water entity points to latest profile (SPARQL Update again)
From pilot to practice
   reduce modelling costs
       patterns
       reuse
   handling change and update
       patterns
       publication process
   automation
       conversion                             dive 3
       publication
   embed in the business process
       use internally as well as externally
       publish once, use many
       data platform
Automation
Transform and publish data feed increments
    transformation engine service
    reusable mappings, low cost to adapt to new feeds
    linking to reference data
    publication service that supports non-monotonic changes




                                                           publication
                                                             service
     data increments (csv)                 transform
                                             service


                                                                         replicated
                             xform xform         reconciliation
                                xform
                             spec. spec.                                 publication
                                spec.               service
                                                                           servers

                                                   Reference data
Transformation service
   declarative specification of transform
       single service support range of transformations
       easy to adapt transformation to new feeds and modelling
        changes
   R2RML – RDB to RDF Mapping Language
       specify mapping from database tables to RDF triples
       W3C candidate recommendation
   D2RML
       R2RML extension to treat CSV feed as a database table
Small D2RML example
:dataSource a dr:CSVDataSource ;
  rdfs:label "dataSource" .

:bathingWaterTermMap a dr:SubjectMap;
  dr:template "http://environment.data.gov.uk/id/bathing-water/{EUBWID2}" ;
  dr:class def-bw:BathingWater .

:bathingWaterMap
  dr:logicalTable :dataSource ;
  dr:subjectMap   :bathingWaterTermMap ;

  dr:predicateObjectMap [
    dr:predicate rdfs:label ;
    dr:objectMap [dr:column "description_english" ;   dr:language "en"   ] ]

  dr:predicateObjectMap [
    dr:predicate def-bw:eubwidNotation;
    dr:objectMap [ dr:column "EUBWID2"; dr:datatype def-bw:eubwid    ]   ] .
Using patterns
   problems with verbosity, increases reuse costs
   extend to support modelling patterns
   Data Cube
       specify mapping to observation with measures and dimensions
       engine generates Data Set and Data Structure Definition
        automatically
D2RML cube map example
:dataCubeMap a dr:DataCubeMap ;
    rr:logicalTable “dataSource”;
    dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ;
    dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ;

                                                            Instances will
    dr:observationMap [                                  automatically link to
     rr:subjectMap [                                        base Data Set
        rr:termType rr:IRI ;
        rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ;
        rr:componentMap [
                                              Implies an entry in the Data
          dr:componentType qb:measure ;
                                              Structure Definition which is
          rr:predicate aq:concentration ;
                                                    auto-generated
          rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ]
         ] ;
        ...                                     Define how measure
                                                    value is to be
                                                    represented
But what about linking?
   connect observations to reference data
       a core value of linked data
   R2RML has Term Maps to create values
       constants and templates
   extend to allow maps based on other data sources
       Lookup map
           lookup resource in a store, fetch predicate
       Reconcile
           specify lookup in a remote service
           use Google Refine reconciliation API
Automation
Transform and publish data feed increments
    transformation engine service 
    reusable mappings, low cost to adapt to new feeds 
    linking to reference data 
    publication service that supports non-monotonic changes




                                                           publication
                                                             service
     data increments (csv)                 transform
                                             service


                                                                         replicated
                             xform xform         reconciliation
                                xform
                             spec. spec.                                 publication
                                spec.               service
                                                                           servers

                                                   Reference data
Publication service
   goals
       cope with non-monotonic effects of change representation
       so replication is robust and cheap (=> make it idempotent)
   solution
       SPARQL Update
       publish transformed increment as a simple DATA INSERT
       then run SPARQL Update script for non-monotonic links
           dct:replacedBy links
           lastest value slices
Sample update script
DELETE {
  ?bw bwq:latestComplianceAssessment ?o .
} WHERE {
  ?bw bwq:latestComplianceAssessment ?o .
}



INSERT {
   ?bw bwq:latestComplianceAssessment ?o .
} WHERE {
 {
   ?slice a bwq:ComplianceByYearSlice;    bwq:sampleYear [interval:ordinalYear ?year].
   OPTIONAL {
     ?slice2 a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year2].
        FILTER (?year2 > ?year)
      } FILTER ( !bound(?slice2) )
   }
   ?slice qb:observation ?o .

    ?o bwq:bathingWater ?bw.
}
Automation
Transform and publish data feed increments
    transformation engine service 
    reusable mappings, low cost to adapt to new feeds 
    linking to reference data 
    publication service that supports non-monotonic changes 




                                                           publication
                                                             service
     data increments (csv)                 transform
                                             service


                                                                         replicated
                             xform xform         reconciliation
                                xform
                             spec. spec.                                 publication
                                spec.               service
                                                                           servers

                                                   Reference data
Application to case study
   Update server
       transforms based on scripts (earlier scripting utility)
       linking to reference data
       distributed publication via
        SPARQL Update
       extensible range of data sets
             annual assessments
             in-season assessments
             bathing water profile
             features (e.g. pollution sources)
             reference data
From pilot to practice
   reduce modelling costs
       patterns
       reuse
   handling change and update
       patterns
       publication process
   automation
       conversion
       publication
   embed in the business process              dive 4
       use internally as well as externally
       publish once, use many
       data platform
Embed in business process
 embedding is critical to ensure data kept up to date
 in turn needs usage
=> lower barrier to use                   external
                                                   use



                  data not
                   used                rich, up
                                       to date               invest
                                         data



      data goes              hard to
        stale                justify
                                                  internal
                                                    use
Lowering barrier to use
   simple REST APIs
       use Linked Data API specification
       rich query without learning SPARQL
       easy consumption as JSON, XML
       gets developers used to data and data model
                    publication




                                            LD API
                      service




        transform
          service
Application to case study
   embedded in process for weekly/daily updates
   infrastructure to automate conversion and publishing
   API plus extensive developer documentation
   third party and in-house applications built over API




   publish once, use many
   information products as applications over a data platform,
    usable externally as well as internally
The next stage
   grow range of data publications and uses
   range of reference data and sets brings new challenges
       discover reference terms and models to reuse
       discover datasets to use for application
       discover models and links between sets
   needs a coordination or registry service
   story for another day ...
Conclusions
   illustrated how public sector users of linked are moving
    from static pilots to operational systems
   keys are:
       reduce modelling costs through patterns and reuse
       design for continuous update
       automation of publication using declarative mappings and
        SPARQL Update
       lower barrier to use through API design and documentation
       embed in organization’s process so the data is used and useful
Acknowledgements
Only possible thanks to many smart colleagues: Stuart
Williams, Andy Seaborne, Ian Dickinson, Brian McBride,
Chris Dollin
plus Alex Coley and team from the Environment Agency

Más contenido relacionado

Similar a Industrialized Linked Data

Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data HypercubesDave Reynolds
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of DataJohn Domingue
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperDerek Diamond
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012John Domingue
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonAlex Coley
 
Linking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanLinking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanSemantic Web Company
 
Ipres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of PreservationIpres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of Preservationneilgrindley
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Bert Taube
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
Water Innovation Network (WIN)
Water Innovation Network (WIN)Water Innovation Network (WIN)
Water Innovation Network (WIN)InnovatioNews
 
Y&L Information_Mgmt Portfolio
Y&L Information_Mgmt PortfolioY&L Information_Mgmt Portfolio
Y&L Information_Mgmt PortfolioClint Campbell
 
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...Greenapps&web
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
 
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB Project
 
Models Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHIModels Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHIStephen Flood
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 

Similar a Industrialized Linked Data (20)

Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of Data
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz London
 
Linking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanLinking UK Government Data, John Sheridan
Linking UK Government Data, John Sheridan
 
Ipres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of PreservationIpres 2011 The Costs and Economics of Preservation
Ipres 2011 The Costs and Economics of Preservation
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Water Innovation Network (WIN)
Water Innovation Network (WIN)Water Innovation Network (WIN)
Water Innovation Network (WIN)
 
Y&L Information_Mgmt Portfolio
Y&L Information_Mgmt PortfolioY&L Information_Mgmt Portfolio
Y&L Information_Mgmt Portfolio
 
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
 
Models Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHIModels Done Better... - UDG2018 - Intertek and DHI
Models Done Better... - UDG2018 - Intertek and DHI
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 

Último

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Industrialized Linked Data

  • 1. Industrialized Linked Data Dave Reynolds, Epimorphics Ltd @der42
  • 3. Linked Data journey ... explore  what is linked data?  what use it is for us?
  • 4. Linked Data journey ... explore  what is linked data?  what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...
  • 5. Linked Data journey ... explore  what is linked data?  what use it is for us?  self-describing  Integration  carries semantics with it  comparable  annotate and explain  slice and dice  data in context  web API  ...  ...  what’s involved?
  • 6. Linked Data journey ... explore pilot data model convert publish apply Photo of The Thinker © dSeneste.dk@flicker CC BY
  • 7. Linked Data journey ... explore pilot routine? Great pilot but ...  can we reduce the time and cost?  how do we handle changes and updates?  how can we make the published data easier to use? How do we make Linked Data “business as usual”?
  • 8. Example case study: Environment Agency  monitoring of bathing water quality  static pilot  live pilot  historic annual assessments  weekly assessments  operational system  additional data feeds  live update  integrated API  data explorer
  • 9. From pilot to practice  reduce modelling costs  patterns dive1  reuse  handling change and update  patterns  publication process  automation  conversion  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  • 10. Reduce costs - modelling 1. Don’t do it  map source data into isomorphic RDF, synthesize URIs  loses some of the value proposition 2. Reuse existing ontologies intact or mix-and-match  best solution when available  W3C GLD work on vocabularies – people, organizations, datasets ... 3. Reusable vocabulary patterns  example:  Data cube plus reference URI sets  adaptable to broad range of data – environmental, statistical, financial ...
  • 11. Reusable patterns: Data cube  Much public sector data has regularities  set of measures  observations, forecasts, budgets, assessments, statistics ... >0.1 34 27 good excellent poor good 125
  • 12. Reusable patterns: Data cube  Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ... objective code cost centre 12 15 25 measure: spend 8 9 11 120 130 180 time
  • 13. Reusable patterns: Data cube  Much public sector data has regularities  sets of measures  observations, forecasts, budgets, assessments, estimates ...  organized along some dimensions  region, agency, time, category, cost centre ...  interpreted according to attributes  units, multipliers, status objective code cost centre provisional $12k $15k $25k measure: spend $8k $9k $11k final $120k $130k $180k time
  • 15. Data cube pattern  Pattern, not a fixed ontology  customize by selecting measures, dimensions and attributes  originated in publishing of statistics  applied to environment measurements, weather forecasts, budgets and spend, quality assessments, regional demographics ...  Supports reuse  widely reusable URI sets – geography, time periods, agencies, units  organization-wide sets  modelling often only requires small increments on top of core pattern and reusable components  opens door for reusable visualization tools  standardization through W3C GLD
  • 16. Application to case study  Data Cubes for water quality measurement  in-season weekly assessments  end of season annual assessments  dimensions:  time intervals – UK reference time service  location - reference URI set for bathing waters and sample pts  cubes can reuse these dimensions  just need to define specific measures
  • 17. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns dive 2  publication process  automation  conversion  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  • 18. Handling change  critical challenge  most initial pilots choose a snapshot dataset  and go stale, fast  understanding the nature of data updates and how to handle them is critical to successful scaling to business as usual  types of change  new data related to different time period  corrections to data  entities change  properties  identity
  • 19. Modelling change 1. Individual data items relate to new time period Pattern: n-ary relation  observation resource relates value to time period and other context  use Data Cube dimensions for this bwq:sampleYear bwq:bathingWater http://reference.data.gov.uk/id/year/2009 http://environment.data.gov. uk/id/bathing- bwq:classification Higher water/ukk1202-36000 bwq:sampleYear Clevedon Beach http://reference.data.gov.uk/id/year/2010 bwq:classification Minimum bwq:sampleYear http://reference.data.gov.uk/id/year/2011 bwq:classification Higher History or latest?  latest is non-monotonic but helpful for many practical uses  materialize (SPARQL Update), implement in query, implement in API  choice whether to keep history as well  water quality v. weather forecasts
  • 20. Modelling change 2. Corrections  patterns  silent change (!)  explicit replacement  API level hides replaced values but SPARQL query can retrieve & trace  explicit change event bwq:sampleYear http://environment.data.gov. bwq:bathingWater classification : Higher http://reference.data.gov.uk/id/year/2011 uk/id/bathing- water/ukk1202-36000 dct:isReplacedBy ev:after Clevedon Beach dct:replaces ev:occuredOn classification : Minimum status: replaced analysis event reason: reanalysis ev:before ev:agent
  • 21. Modelling change 3. Mutation  Infrequent change of properties, essential identity remains  e.g. renaming a school, adding another building  routine accesses see property value, not function of time  patterns  in place update  named graphs  current graph + graphs for each previous state + meta-graph  explicit versioning with open periods
  • 22. Modelling change 3. Mutation explicit versioning with open periods dct:hasVersion dct:hasVersion endurant “Clevedon Beach” “Clevedon Sands” time:intervalStarts time:intervalStarts dct:valid 2003 dct:valid 2011 2011 time:intervalFinishes  find right version by query on validity interval  simplify use through  non-monotonic “latest value” link  API to implement query filters automatically
  • 23. Application to case study  weekly and annual samples  use Data Cube pattern (n-ary relation)  withdrawn samples  replacement pattern (no explicit change event)  Data Cube slice for “latest valid assessment”  generated by a SPARQL Update query  API gives easy access to the latest valid values  linked data following or raw SPARQL query allows drilling into changes  changes to bathing water profile  versioning pattern  bathing water entity points to latest profile (SPARQL Update again)
  • 24. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns  publication process  automation  conversion dive 3  publication  embed in the business process  use internally as well as externally  publish once, use many  data platform
  • 25. Automation Transform and publish data feed increments  transformation engine service  reusable mappings, low cost to adapt to new feeds  linking to reference data  publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  • 26. Transformation service  declarative specification of transform  single service support range of transformations  easy to adapt transformation to new feeds and modelling changes  R2RML – RDB to RDF Mapping Language  specify mapping from database tables to RDF triples  W3C candidate recommendation  D2RML  R2RML extension to treat CSV feed as a database table
  • 27. Small D2RML example :dataSource a dr:CSVDataSource ; rdfs:label "dataSource" . :bathingWaterTermMap a dr:SubjectMap; dr:template "http://environment.data.gov.uk/id/bathing-water/{EUBWID2}" ; dr:class def-bw:BathingWater . :bathingWaterMap dr:logicalTable :dataSource ; dr:subjectMap :bathingWaterTermMap ; dr:predicateObjectMap [ dr:predicate rdfs:label ; dr:objectMap [dr:column "description_english" ; dr:language "en" ] ] dr:predicateObjectMap [ dr:predicate def-bw:eubwidNotation; dr:objectMap [ dr:column "EUBWID2"; dr:datatype def-bw:eubwid ] ] .
  • 28. Using patterns  problems with verbosity, increases reuse costs  extend to support modelling patterns  Data Cube  specify mapping to observation with measures and dimensions  engine generates Data Set and Data Structure Definition automatically
  • 29. D2RML cube map example :dataCubeMap a dr:DataCubeMap ; rr:logicalTable “dataSource”; dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ; dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ; Instances will dr:observationMap [ automatically link to rr:subjectMap [ base Data Set rr:termType rr:IRI ; rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ; rr:componentMap [ Implies an entry in the Data dr:componentType qb:measure ; Structure Definition which is rr:predicate aq:concentration ; auto-generated rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ] ] ; ... Define how measure value is to be represented
  • 30. But what about linking?  connect observations to reference data  a core value of linked data  R2RML has Term Maps to create values  constants and templates  extend to allow maps based on other data sources  Lookup map  lookup resource in a store, fetch predicate  Reconcile  specify lookup in a remote service  use Google Refine reconciliation API
  • 31. Automation Transform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  • 32. Publication service  goals  cope with non-monotonic effects of change representation  so replication is robust and cheap (=> make it idempotent)  solution  SPARQL Update  publish transformed increment as a simple DATA INSERT  then run SPARQL Update script for non-monotonic links  dct:replacedBy links  lastest value slices
  • 33. Sample update script DELETE { ?bw bwq:latestComplianceAssessment ?o . } WHERE { ?bw bwq:latestComplianceAssessment ?o . } INSERT { ?bw bwq:latestComplianceAssessment ?o . } WHERE { { ?slice a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year]. OPTIONAL { ?slice2 a bwq:ComplianceByYearSlice; bwq:sampleYear [interval:ordinalYear ?year2]. FILTER (?year2 > ?year) } FILTER ( !bound(?slice2) ) } ?slice qb:observation ?o . ?o bwq:bathingWater ?bw. }
  • 34. Automation Transform and publish data feed increments  transformation engine service   reusable mappings, low cost to adapt to new feeds   linking to reference data   publication service that supports non-monotonic changes  publication service data increments (csv) transform service replicated xform xform reconciliation xform spec. spec. publication spec. service servers Reference data
  • 35. Application to case study  Update server  transforms based on scripts (earlier scripting utility)  linking to reference data  distributed publication via SPARQL Update  extensible range of data sets  annual assessments  in-season assessments  bathing water profile  features (e.g. pollution sources)  reference data
  • 36. From pilot to practice  reduce modelling costs  patterns  reuse  handling change and update  patterns  publication process  automation  conversion  publication  embed in the business process dive 4  use internally as well as externally  publish once, use many  data platform
  • 37. Embed in business process  embedding is critical to ensure data kept up to date  in turn needs usage => lower barrier to use external use data not used rich, up to date invest data data goes hard to stale justify internal use
  • 38. Lowering barrier to use  simple REST APIs  use Linked Data API specification  rich query without learning SPARQL  easy consumption as JSON, XML  gets developers used to data and data model publication LD API service transform service
  • 39. Application to case study  embedded in process for weekly/daily updates  infrastructure to automate conversion and publishing  API plus extensive developer documentation  third party and in-house applications built over API  publish once, use many  information products as applications over a data platform, usable externally as well as internally
  • 40. The next stage  grow range of data publications and uses  range of reference data and sets brings new challenges  discover reference terms and models to reuse  discover datasets to use for application  discover models and links between sets  needs a coordination or registry service  story for another day ...
  • 41. Conclusions  illustrated how public sector users of linked are moving from static pilots to operational systems  keys are:  reduce modelling costs through patterns and reuse  design for continuous update  automation of publication using declarative mappings and SPARQL Update  lower barrier to use through API design and documentation  embed in organization’s process so the data is used and useful Acknowledgements Only possible thanks to many smart colleagues: Stuart Williams, Andy Seaborne, Ian Dickinson, Brian McBride, Chris Dollin plus Alex Coley and team from the Environment Agency