SlideShare una empresa de Scribd logo
1 de 11
FRS and Linked Open Data Potential –
Conceptual Discussion v 1.3
November 30, 2010



                                                                          Dave Smith
                                                             USEPA/OEI/OIC/IESD/ISSB
                                                                smith.davidg@epa.gov
                                                                         202-566-0797


Document Change History
  Revision     Date                        Author      Description
1.0         11/12/2010    David G. Smith            Initial Version
1.1         11/24/2010    David G. Smith            Minor
                                                    updates/revisions
                                                    as followon to
                                                    11/23 discussion
1.2         11/29/2010    David G. Smith            Collaborations,
                                                    potential pilots,
                                                    FOAF and other
                                                    models
1.3         11/30/2010    David G. Smith            Additional
                                                    collaborations and
                                                    detail on facility
                                                    granularity
                                                    concept
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                                                       November 30, 2010




 Contents
 Document Change History.......................................................................1

 Introduction:............................................................................................2

 Concept:...................................................................................................2

 Current Situation:.....................................................................................3

 Linked Open Data Issues:.........................................................................3

 Data Model Issues: .................................................................................7

 Linked Open Data Development:.............................................................7

 Existing Resources....................................................................................7

 Short-Term data needs:...........................................................................7

 Potential Pilots.........................................................................................9

 Longer-Range, Emergent data needs:....................................................10

 Other Ongoing, Related Activities..........................................................11




Introduction:
The intent of this concept paper is to initially explore some conceptual, blue-sky, no-constraints for
potential improvements to the FRS Linked Open Data approach being published via data.gov, and to
stimulate additional ideas and brainstorming. Followon to this will be examination of alternatives,
prioritizations and finalization of thoughts toward implementation.


Concept:
Provide enhancements to FRS Linked Open Data approach to improve analysis, enhance facility
representation, improve robustness of LOD querying and analytics, integrate other existing metadata
capabilities and improve capabilities to support Semantic Web approaches, such as more-informed RDF
serialization.




                                                                           2
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                       November 30, 2010


Current Situation:
FRS data is currently being published via Data.gov, e.g. RDF button on Data.gov catalog pages (e.g.
http://www.data.gov/raw/1030 ) for FRS data.




                               Figure 1: Example of Current FRS RDF Offering (highlighted in red box)



The data returned is tied to a data.gov URL, e.g.
http://www.data.gov/semantic/data/alpha/1030/dataset-1030.rdf.gz


Linked Open Data Issues:
Currently, FRS and other datasets published via Data.gov are being serialized as RDF to support semantic
web and linked open data. A basic problem with the Data.gov RDF does not just apply to the FRS RDF
data, it likely applies across the board.

Firstly, in terms of access, the data is a gzipped download. Data must be downloaded and unzipped
before it can be accessed - more ideally, it would be good to see Data.gov serving the data up as a
SPARQL endpoint, or as a SESAME repository or other means of serving up a triple store. That
download/unzip paradigm does not lend itself to dynamic mashups.

                                                    3
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                         November 30, 2010


With regard to the Data.gov RDF, it appears to be a brute-force serialization of data tables into RDF. It
doesn't really have the semantic depth to support analysis that it could use (See Fig. 1-3).


      <rdf:Description rdf:about="#entry9985">
            <hdatum_desc>NAD83</hdatum_desc>
            <state_name>NEBRASKA</state_name>
            <latitude83>40.944623</latitude83>
            <interest_types>STATE MASTER</interest_types>
            <city_name>GARLAND</city_name>
            <create_date>01-MAR-00</create_date>
          <frs_facility_detail_report_url rdf:resource="
      http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility?
      p_registry_id=110006555085 "/>
            <congressional_dist_num>01</congressional_dist_num>
            <pgm_sys_acrnms>NE-IIS</pgm_sys_acrnms>
            <epa_region_code>07</epa_region_code>
            <country_name>USA</country_name>
            <fips_code>31159</fips_code>
            <huc_code>10200203</huc_code>
            <collect_desc>ADDRESS MATCHING-HOUSE NUMBER</collect_desc>
            <primary_name>TERRI KELLER RESIDENCE</primary_name>
            <rdf:type rdf:resource=" http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#DataEntry
      "/>
            <ref_point_desc>ENTRANCE POINT OF A FACILITY OR STATION</ref_point_desc>
            <postal_code>683609338</postal_code>
            <registry_id>110006555085</registry_id>
            <location_address>1976 OLD MILL RD</location_address>
            <accuracy_value>30</accuracy_value>
            <update_date>06-AUG-01</update_date>
            <county_name>SEWARD</county_name>
            <conveyor>FRS</conveyor>
            <longitude83>-96.990306</longitude83>
            <state_code>NE</state_code>
            <site_type_name>STATIONARY</site_type_name>
         </rdf:Description>




                                   Figure 1: Sample of current Data.gov FRS RDF/XML Representation




                                                     4
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                 November 30, 2010


      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#hdatum_desc > "NAD83" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#state_name > "NEBRASKA" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#latitude83 > "40.944623" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#interest_types > "STATE
      MASTER" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#city_name > "GARLAND" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#create_date > "01-MAR-00" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#frs_facility_detail_report_ur
      l > < http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility?
      p_registry_id=110006555085 > .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#congressional_dist_num > "01"
      .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#pgm_sys_acrnms > "NE-IIS" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#epa_region_code > "07" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#country_name > "USA" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#fips_code > "31159" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#huc_code > "10200203" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#collect_desc > "ADDRESS
      MATCHING-HOUSE NUMBER" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#primary_name > "TERRI KELLER
      RESIDENCE" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.w3.org/1999/02/22-rdf-syntax-ns#type > < http://data-gov.tw.rpi.edu/2009/data-
      gov-twc.rdf#DataEntry > .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#ref_point_desc > "ENTRANCE
      POINT OF A FACILITY OR STATION" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#postal_code > "683609338" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#registry_id >
      "110006555085" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#location_address > "1976 OLD
      MILL RD" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#accuracy_value > "30" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#update_date > "06-AUG-01" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#county_name > "SEWARD" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
      http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#conveyor > "FRS" .
      < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <


                                                 5
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                          November 30, 2010


       http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#longitude83 > "-96.990306" .
       < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
       http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#state_code > "NE" .
       < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > <
       http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#site_type_name > "STATIONARY"
       .

                                   Figure 2: Sample of current Data.gov FRS Representation as Triples



The current RDF serialization is essentially just a brute force conversion - there is plenty of opportunity
to enhance and improve.

The properties are things that some EPA users might easily understand, but would others, e.g.
huc_code, pgm_sys_acrnms – are these uniquely identifiable and understood, within this dataset?
Thinking import reference to EPA data dictionary, perhaps EPA namespace or other means of defining
them more positively is needed. We have a lot of metadata that we can bring into the mix, toward
enhancing identifiability, understandability and usability of the RDF data.

There isn't really much structure or model, it's essentially a flat table. Everything is just treated as
alphanumeric data types. No temporal intelligence to dates, et cetera. It doesn't identify registry ID as
something unique or indexable. There are many things that can and should be defined better. There is
probably a semantic analogue to our data model that we can develop as an RDF/OWL/etc analogue and
then map to it.

One approach which may make more sense is to go back and look at the relational database model,
which can support more richness – essentially, individual tables and their relationships would be
generated as Linked Open Data, and the SPARQL queries would then have the flexibility of current SQL
queries.

Regarding the properties, are there in some cases other namespaces that we could/should be
leveraging? geo: as one example - our data is, however, NAD83, and geo: assumes WGS84. We could
reproject to WGS84 and provide geo: values to supplement what we have, as one possibility. Similarly,
maybe foaf: or other namespaces, which deal with addresses and points of contact. The RDF only
carries locations, but FRS also has contacts, if we should at some point incorporate those as well.

In summary, I think it could stand to be improved from a standpoint of accessibility (SPARQL, et cetera - I
think Data.gov needs to look at that from a services infrastructure standpoint), and then, improved
usability, by following more of a data model approach, as opposed to this flat mapping, and approaches
like mapping to existing namespaces and following existing models where appropriate, and we should
be able to leverage some of our metadata elements, data models and other artifacts toward a better
representation and mapping.




                                                     6
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                     November 30, 2010


Data Model Issues:
Long range, some additional tweaks to FRS data model may be needed in order to enhance data
representation and better support Linked Open Data - some of these are described in brief below.


Linked Open Data Development:
Potential collaboration with

   •   Joshua Lieberman (OGC Geospatial Semantics SWG)

   •   Spatial Ontology Community of Practice

   •   Jim Hendler (RPI), George Thomas (HHS): CIO Council and Data.gov Geospatial Semantics
       threads

   •   John Harman / Michael Pendleton (LOD, SRS)

   •   Steve Young / Zach Scott / Open Gov Team (LOD)

   •   Talis, pending contract (LOD)

   •   TRI Program (Potential Pilot)

   •   Kevin Kirby (Data Model)

   •   Tom Giffen (Data Model, Business Rules)

   •   Ken Blumberg (Business Rules)

   •   Cindy Dickinson (Standards, Business Rules)

   •   Others (program offices, regions, GISWG)


Existing Resources
    • Leverage Data Modeling work that Kevin Kirby has been working on

   •   Drill into gist.owl and other potential resources


Short-Term data needs:
   • Semantic Enhancements / Linked Open Data
       Improvement of capabilities for supporting Linked Open Data applications –
       Analysis of data structure toward supporting faceted, dimensional analyses (Figure 1)
       Development of URI schemes, potentially namespaces, and mans and approaches for allowing

                                                     7
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                                            November 30, 2010


       unique identification and linkage




                        Administrative POC

                                                                                              Site -level Organizational
                            Legal POC                                                                  Affiliation

                         Operational POC
                                                                                              Ultimate Organizational
                                                                                                      Parent

                                                                                                                                 Lat/Long


                                                   People                                                                  Physical USPS Address

                                                                                                                               Municipality
                                                                         Organizational
                                                                          Dimension                                             HUC Code




                                                                                                              Spatial
                                             Temporal Dimension               Site
                                                                                                            Dimension




                                                                      Regulatory Dimension

                                                                                                        Program IDs




                                                                            Activity


                                                                                                        NAICS Code


                                                                                                          SIC Code




                                             Figure 3: Potential Facets / Dimensions for Analysis and Semantic
                                                                                                  Enhancement




   •   Semantic Dimensions:
       Explore various dimensions of facility:

       •   Spatial –

               o   GML representation of absolute location (lat/long, etc)

               o   Spatial representation framework for facility (building footprints, parcel boundary,
                   others for future)

               o   Facility data modeling granularity and relationships - get a better handle on what
                   the facility "thing" represents, and its' relation to other things - for example, a parcel
                                                                  8
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                       November 30, 2010


                   boundary, containing an industrial complex with manufacturing and storage
                   buildings (differing NAICS, possibly even different companies operating and
                   licensed/permitted), plus associated air stacks, SPCC measures, water outfalls, et
                   cetera. When we pull up "facility" it should ultimately reflect that bigger picture for
                   context, with the component of interest in highlight.


       •   Temporal

               o   Data currency

               o   Temporal aspects to regulation, enforcement, permitting, et cetera – future


       •   Corporate Dimension

               o   Corporate ownership – at facility level and at ultimate corporate parent level


       •   Function - Activity and Use

               o   NAICS/SIC Codes

               o   EPA Regulatory program

               o   EPA Interest Type

               o   Linkages / translation between interest type and other ontologies/vocabularies

               o   Linkages to regulatory programs and other components


       •   Interrelationships of facilities (future)


       •   Individuals

               o   Friend-of-a-friend (FOAF) and other existing RDF constructs


       •   Many other potential enhancements


Potential Pilots
A number of potential pilots for mashups can be considered. What may be “low hanging fruit” for OEI
build upon exploitation of known internal assets, i.e.
                                                     9
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                          November 30, 2010


    •   FRS
    •   TRI (Toxic Release Quantities for Given Location)
    •   SRS (Substance)
Potentially, as one scenario, one could tie TRI discharges to reaches via OW web services and TRI
reported receiving waters, and then tie this to observed impacts downstream.
One caveat of using EPA data is that it is known to EPA users, but ideally needs to be more fully fleshed-
out to make it discoverable and uniquely identifiable for external users, perhaps via embedded EPA
identifiers (perhaps an epa: namespace or similar means of identifying our assets)
Other potential scenarios TBD… OECA targeted enforcement vs. OSHA, or OPP vs. USDA pesticides
application data.


Longer-Range, Emergent data needs:
These are not specific to LOD, but are instead emergent attributes of interest for FRS – LOD approaches
may help inform on how to structure these.

    •   HUC Codes
        Completion of prepopulating of HUC Codes can support identification of facilities impacting
        major watersheds, e.g. Chesapeake Bay (OECA need) – Other potential needs: Airsheds


    •   Municipality
        Toward improving data quality – Physical street address may include ZIP Code for city which is
        different than actual municipality where site resides – for example, Suburban Drive, State
        College PA is actually Ferguson Township, PA – and local planning and building code officials and
        emergency responders who either have or need information on the facility of interest would be
        different than that of the one listed


    •   Relationship
        Ability to relate facilities – relating individual components of a larger system of infrastructure,
        such as relating a gas terminal to a compressor station – changes to one may impact others.
        Ability to organize information in appropriate fashions, such as relating multiple individual oil
        platforms with discrete permits to a lease boundary with another level of permitting.


    •   Indian Country
        More robust identification/validation of facilities which may lie within tribal boundaries –
        refinement of IND-3 boundaries with other source data, analysis of flows containing either tribal
        flag (Y/N) and/or tribal identifier (tribe/reservation name) - (collaboration with Elizabeth


                                                     10
FRS Data Model Initial Conceptual Discussion
                             November 11, 2010                        November 30, 2010


        Jackson / Ed Liu)


    •   Facility Definition
        Potential broadening of scope and use of FRS to accomodate grant award locations and other
        types of locations – 2005 NAPA Report recommendations for consistent agencywide site
        identification. May be predicated on buildout of other capabilities, such as being able to relate
        sites.


Other Ongoing, Related Activities
A number of activities, internal and external, can help to inform on direction and data model for FRS
data collection and publishing activities – some of these are listed below:

    •   Potential EPA Corporate ID Workgroup
        Collaborate with TRI, TSCA, FRP, RMP, Others who collect corporate parent information, as well
        as OECA and others who need corporate parent information to support analysis.

    •   White House Corporate ID Workgroup
        Collaborate with emergent White House Corporate ID workgroup – Beth Noveck / Steve Croley,
        SEC, Labor and other agencies to align, coordinate and collaborate on corporate identifiers

    •   OpenGov
        Collaboration with EPA Open Gov initiatives to inform on how best to publish data for external
        reuse.

    •   National Academy of Public Administration
        Follow-through on 2005 NAPA Report recommendations

    •   Spatial Ontology Community of Practices (SOCOP)
        Collaboration on vocabularies, standards and data modeling approaches

    •   Data.Gov Data Architecture Subgroup
        Collaboration on vocabularies, standards and data modeling approaches

    •   EPA OEI/OIC/IESD Data Standards Branch
        Collaboration on vocabularies, standards and data modeling approaches

    •   Others…


Anticipated Next Steps:
TBD, develop ideas for potential pilots, engage on “LOD Cookbook” and approaches for representing
and rendering our data as RDF.

                                                   11

Más contenido relacionado

Similar a FRS Linked Open Data Concept v1.3 20101130

Shareable Metadata for Visual Resources
Shareable Metadata for Visual ResourcesShareable Metadata for Visual Resources
Shareable Metadata for Visual ResourcesJenn Riley
 
Linked Open Government Data in UK
Linked Open Government Data in UKLinked Open Government Data in UK
Linked Open Government Data in UKreeep
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyShital Kat
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...European Data Forum
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALMark Tabladillo
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...Athens Big Data
 
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasDC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasMikael Nilsson
 
20100614 ISWSA Keynote
20100614 ISWSA Keynote20100614 ISWSA Keynote
20100614 ISWSA KeynoteAxel Polleres
 
RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to PracticeAdrian Stevenson
 
More than Raw: Government Data Online
More than Raw: Government Data OnlineMore than Raw: Government Data Online
More than Raw: Government Data OnlineGordon Grace
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed DataDaniel Vila Suero
 
Integrating Government Data New
Integrating Government Data NewIntegrating Government Data New
Integrating Government Data Newguest4543bb
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 

Similar a FRS Linked Open Data Concept v1.3 20101130 (20)

Shareable Metadata for Visual Resources
Shareable Metadata for Visual ResourcesShareable Metadata for Visual Resources
Shareable Metadata for Visual Resources
 
Linked Open Government Data in UK
Linked Open Government Data in UKLinked Open Government Data in UK
Linked Open Government Data in UK
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
D B M S Animate
D B M S AnimateD B M S Animate
D B M S Animate
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
 
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasDC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
 
20100614 ISWSA Keynote
20100614 ISWSA Keynote20100614 ISWSA Keynote
20100614 ISWSA Keynote
 
RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to Practice
 
Intro toxml
Intro toxmlIntro toxml
Intro toxml
 
More than Raw: Government Data Online
More than Raw: Government Data OnlineMore than Raw: Government Data Online
More than Raw: Government Data Online
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
 
Integrating Government Data New
Integrating Government Data NewIntegrating Government Data New
Integrating Government Data New
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 

Más de Dave Smith / USEPA Office of Environmental Information (9)

DC Web API Meetup Oct 4 2016
DC Web API Meetup Oct 4 2016DC Web API Meetup Oct 4 2016
DC Web API Meetup Oct 4 2016
 
GeoDC Maker Talks: GPS-Enabled Sensor Platforms using Arduino
GeoDC Maker Talks:  GPS-Enabled Sensor Platforms using ArduinoGeoDC Maker Talks:  GPS-Enabled Sensor Platforms using Arduino
GeoDC Maker Talks: GPS-Enabled Sensor Platforms using Arduino
 
FRS Emergency Response Data Quality Initiatives
FRS Emergency Response Data Quality InitiativesFRS Emergency Response Data Quality Initiatives
FRS Emergency Response Data Quality Initiatives
 
Chemical Facilities Safety - Executive Order 13560
Chemical Facilities Safety - Executive Order 13560Chemical Facilities Safety - Executive Order 13560
Chemical Facilities Safety - Executive Order 13560
 
HIFLD Presentation Fall 2013
HIFLD Presentation Fall 2013HIFLD Presentation Fall 2013
HIFLD Presentation Fall 2013
 
Linked Data W3C 20110629
Linked Data W3C  20110629Linked Data W3C  20110629
Linked Data W3C 20110629
 
ESRI DevMeetup 201100607
ESRI DevMeetup 201100607ESRI DevMeetup 201100607
ESRI DevMeetup 201100607
 
Health Data Initiative 20110609
Health Data Initiative 20110609Health Data Initiative 20110609
Health Data Initiative 20110609
 
EcoInformatics FRS Presentation 20101206
EcoInformatics FRS Presentation 20101206EcoInformatics FRS Presentation 20101206
EcoInformatics FRS Presentation 20101206
 

Último

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

FRS Linked Open Data Concept v1.3 20101130

  • 1. FRS and Linked Open Data Potential – Conceptual Discussion v 1.3 November 30, 2010 Dave Smith USEPA/OEI/OIC/IESD/ISSB smith.davidg@epa.gov 202-566-0797 Document Change History Revision Date Author Description 1.0 11/12/2010 David G. Smith Initial Version 1.1 11/24/2010 David G. Smith Minor updates/revisions as followon to 11/23 discussion 1.2 11/29/2010 David G. Smith Collaborations, potential pilots, FOAF and other models 1.3 11/30/2010 David G. Smith Additional collaborations and detail on facility granularity concept
  • 2. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 Contents Document Change History.......................................................................1 Introduction:............................................................................................2 Concept:...................................................................................................2 Current Situation:.....................................................................................3 Linked Open Data Issues:.........................................................................3 Data Model Issues: .................................................................................7 Linked Open Data Development:.............................................................7 Existing Resources....................................................................................7 Short-Term data needs:...........................................................................7 Potential Pilots.........................................................................................9 Longer-Range, Emergent data needs:....................................................10 Other Ongoing, Related Activities..........................................................11 Introduction: The intent of this concept paper is to initially explore some conceptual, blue-sky, no-constraints for potential improvements to the FRS Linked Open Data approach being published via data.gov, and to stimulate additional ideas and brainstorming. Followon to this will be examination of alternatives, prioritizations and finalization of thoughts toward implementation. Concept: Provide enhancements to FRS Linked Open Data approach to improve analysis, enhance facility representation, improve robustness of LOD querying and analytics, integrate other existing metadata capabilities and improve capabilities to support Semantic Web approaches, such as more-informed RDF serialization. 2
  • 3. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 Current Situation: FRS data is currently being published via Data.gov, e.g. RDF button on Data.gov catalog pages (e.g. http://www.data.gov/raw/1030 ) for FRS data. Figure 1: Example of Current FRS RDF Offering (highlighted in red box) The data returned is tied to a data.gov URL, e.g. http://www.data.gov/semantic/data/alpha/1030/dataset-1030.rdf.gz Linked Open Data Issues: Currently, FRS and other datasets published via Data.gov are being serialized as RDF to support semantic web and linked open data. A basic problem with the Data.gov RDF does not just apply to the FRS RDF data, it likely applies across the board. Firstly, in terms of access, the data is a gzipped download. Data must be downloaded and unzipped before it can be accessed - more ideally, it would be good to see Data.gov serving the data up as a SPARQL endpoint, or as a SESAME repository or other means of serving up a triple store. That download/unzip paradigm does not lend itself to dynamic mashups. 3
  • 4. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 With regard to the Data.gov RDF, it appears to be a brute-force serialization of data tables into RDF. It doesn't really have the semantic depth to support analysis that it could use (See Fig. 1-3). <rdf:Description rdf:about="#entry9985"> <hdatum_desc>NAD83</hdatum_desc> <state_name>NEBRASKA</state_name> <latitude83>40.944623</latitude83> <interest_types>STATE MASTER</interest_types> <city_name>GARLAND</city_name> <create_date>01-MAR-00</create_date> <frs_facility_detail_report_url rdf:resource=" http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility? p_registry_id=110006555085 "/> <congressional_dist_num>01</congressional_dist_num> <pgm_sys_acrnms>NE-IIS</pgm_sys_acrnms> <epa_region_code>07</epa_region_code> <country_name>USA</country_name> <fips_code>31159</fips_code> <huc_code>10200203</huc_code> <collect_desc>ADDRESS MATCHING-HOUSE NUMBER</collect_desc> <primary_name>TERRI KELLER RESIDENCE</primary_name> <rdf:type rdf:resource=" http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#DataEntry "/> <ref_point_desc>ENTRANCE POINT OF A FACILITY OR STATION</ref_point_desc> <postal_code>683609338</postal_code> <registry_id>110006555085</registry_id> <location_address>1976 OLD MILL RD</location_address> <accuracy_value>30</accuracy_value> <update_date>06-AUG-01</update_date> <county_name>SEWARD</county_name> <conveyor>FRS</conveyor> <longitude83>-96.990306</longitude83> <state_code>NE</state_code> <site_type_name>STATIONARY</site_type_name> </rdf:Description> Figure 1: Sample of current Data.gov FRS RDF/XML Representation 4
  • 5. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#hdatum_desc > "NAD83" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#state_name > "NEBRASKA" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#latitude83 > "40.944623" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#interest_types > "STATE MASTER" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#city_name > "GARLAND" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#create_date > "01-MAR-00" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#frs_facility_detail_report_ur l > < http://iaspub.epa.gov/enviro/fii_query_detail.disp_program_facility? p_registry_id=110006555085 > . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#congressional_dist_num > "01" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#pgm_sys_acrnms > "NE-IIS" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#epa_region_code > "07" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#country_name > "USA" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#fips_code > "31159" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#huc_code > "10200203" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#collect_desc > "ADDRESS MATCHING-HOUSE NUMBER" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#primary_name > "TERRI KELLER RESIDENCE" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.w3.org/1999/02/22-rdf-syntax-ns#type > < http://data-gov.tw.rpi.edu/2009/data- gov-twc.rdf#DataEntry > . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#ref_point_desc > "ENTRANCE POINT OF A FACILITY OR STATION" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#postal_code > "683609338" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#registry_id > "110006555085" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#location_address > "1976 OLD MILL RD" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#accuracy_value > "30" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#update_date > "06-AUG-01" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#county_name > "SEWARD" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#conveyor > "FRS" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < 5
  • 6. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#longitude83 > "-96.990306" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#state_code > "NE" . < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#entry9985 > < http://www.data.gov/semantic/data/alpha/997/dataset-997.rdf#site_type_name > "STATIONARY" . Figure 2: Sample of current Data.gov FRS Representation as Triples The current RDF serialization is essentially just a brute force conversion - there is plenty of opportunity to enhance and improve. The properties are things that some EPA users might easily understand, but would others, e.g. huc_code, pgm_sys_acrnms – are these uniquely identifiable and understood, within this dataset? Thinking import reference to EPA data dictionary, perhaps EPA namespace or other means of defining them more positively is needed. We have a lot of metadata that we can bring into the mix, toward enhancing identifiability, understandability and usability of the RDF data. There isn't really much structure or model, it's essentially a flat table. Everything is just treated as alphanumeric data types. No temporal intelligence to dates, et cetera. It doesn't identify registry ID as something unique or indexable. There are many things that can and should be defined better. There is probably a semantic analogue to our data model that we can develop as an RDF/OWL/etc analogue and then map to it. One approach which may make more sense is to go back and look at the relational database model, which can support more richness – essentially, individual tables and their relationships would be generated as Linked Open Data, and the SPARQL queries would then have the flexibility of current SQL queries. Regarding the properties, are there in some cases other namespaces that we could/should be leveraging? geo: as one example - our data is, however, NAD83, and geo: assumes WGS84. We could reproject to WGS84 and provide geo: values to supplement what we have, as one possibility. Similarly, maybe foaf: or other namespaces, which deal with addresses and points of contact. The RDF only carries locations, but FRS also has contacts, if we should at some point incorporate those as well. In summary, I think it could stand to be improved from a standpoint of accessibility (SPARQL, et cetera - I think Data.gov needs to look at that from a services infrastructure standpoint), and then, improved usability, by following more of a data model approach, as opposed to this flat mapping, and approaches like mapping to existing namespaces and following existing models where appropriate, and we should be able to leverage some of our metadata elements, data models and other artifacts toward a better representation and mapping. 6
  • 7. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 Data Model Issues: Long range, some additional tweaks to FRS data model may be needed in order to enhance data representation and better support Linked Open Data - some of these are described in brief below. Linked Open Data Development: Potential collaboration with • Joshua Lieberman (OGC Geospatial Semantics SWG) • Spatial Ontology Community of Practice • Jim Hendler (RPI), George Thomas (HHS): CIO Council and Data.gov Geospatial Semantics threads • John Harman / Michael Pendleton (LOD, SRS) • Steve Young / Zach Scott / Open Gov Team (LOD) • Talis, pending contract (LOD) • TRI Program (Potential Pilot) • Kevin Kirby (Data Model) • Tom Giffen (Data Model, Business Rules) • Ken Blumberg (Business Rules) • Cindy Dickinson (Standards, Business Rules) • Others (program offices, regions, GISWG) Existing Resources • Leverage Data Modeling work that Kevin Kirby has been working on • Drill into gist.owl and other potential resources Short-Term data needs: • Semantic Enhancements / Linked Open Data Improvement of capabilities for supporting Linked Open Data applications – Analysis of data structure toward supporting faceted, dimensional analyses (Figure 1) Development of URI schemes, potentially namespaces, and mans and approaches for allowing 7
  • 8. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 unique identification and linkage Administrative POC Site -level Organizational Legal POC Affiliation Operational POC Ultimate Organizational Parent Lat/Long People Physical USPS Address Municipality Organizational Dimension HUC Code Spatial Temporal Dimension Site Dimension Regulatory Dimension Program IDs Activity NAICS Code SIC Code Figure 3: Potential Facets / Dimensions for Analysis and Semantic Enhancement • Semantic Dimensions: Explore various dimensions of facility: • Spatial – o GML representation of absolute location (lat/long, etc) o Spatial representation framework for facility (building footprints, parcel boundary, others for future) o Facility data modeling granularity and relationships - get a better handle on what the facility "thing" represents, and its' relation to other things - for example, a parcel 8
  • 9. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 boundary, containing an industrial complex with manufacturing and storage buildings (differing NAICS, possibly even different companies operating and licensed/permitted), plus associated air stacks, SPCC measures, water outfalls, et cetera. When we pull up "facility" it should ultimately reflect that bigger picture for context, with the component of interest in highlight. • Temporal o Data currency o Temporal aspects to regulation, enforcement, permitting, et cetera – future • Corporate Dimension o Corporate ownership – at facility level and at ultimate corporate parent level • Function - Activity and Use o NAICS/SIC Codes o EPA Regulatory program o EPA Interest Type o Linkages / translation between interest type and other ontologies/vocabularies o Linkages to regulatory programs and other components • Interrelationships of facilities (future) • Individuals o Friend-of-a-friend (FOAF) and other existing RDF constructs • Many other potential enhancements Potential Pilots A number of potential pilots for mashups can be considered. What may be “low hanging fruit” for OEI build upon exploitation of known internal assets, i.e. 9
  • 10. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 • FRS • TRI (Toxic Release Quantities for Given Location) • SRS (Substance) Potentially, as one scenario, one could tie TRI discharges to reaches via OW web services and TRI reported receiving waters, and then tie this to observed impacts downstream. One caveat of using EPA data is that it is known to EPA users, but ideally needs to be more fully fleshed- out to make it discoverable and uniquely identifiable for external users, perhaps via embedded EPA identifiers (perhaps an epa: namespace or similar means of identifying our assets) Other potential scenarios TBD… OECA targeted enforcement vs. OSHA, or OPP vs. USDA pesticides application data. Longer-Range, Emergent data needs: These are not specific to LOD, but are instead emergent attributes of interest for FRS – LOD approaches may help inform on how to structure these. • HUC Codes Completion of prepopulating of HUC Codes can support identification of facilities impacting major watersheds, e.g. Chesapeake Bay (OECA need) – Other potential needs: Airsheds • Municipality Toward improving data quality – Physical street address may include ZIP Code for city which is different than actual municipality where site resides – for example, Suburban Drive, State College PA is actually Ferguson Township, PA – and local planning and building code officials and emergency responders who either have or need information on the facility of interest would be different than that of the one listed • Relationship Ability to relate facilities – relating individual components of a larger system of infrastructure, such as relating a gas terminal to a compressor station – changes to one may impact others. Ability to organize information in appropriate fashions, such as relating multiple individual oil platforms with discrete permits to a lease boundary with another level of permitting. • Indian Country More robust identification/validation of facilities which may lie within tribal boundaries – refinement of IND-3 boundaries with other source data, analysis of flows containing either tribal flag (Y/N) and/or tribal identifier (tribe/reservation name) - (collaboration with Elizabeth 10
  • 11. FRS Data Model Initial Conceptual Discussion November 11, 2010 November 30, 2010 Jackson / Ed Liu) • Facility Definition Potential broadening of scope and use of FRS to accomodate grant award locations and other types of locations – 2005 NAPA Report recommendations for consistent agencywide site identification. May be predicated on buildout of other capabilities, such as being able to relate sites. Other Ongoing, Related Activities A number of activities, internal and external, can help to inform on direction and data model for FRS data collection and publishing activities – some of these are listed below: • Potential EPA Corporate ID Workgroup Collaborate with TRI, TSCA, FRP, RMP, Others who collect corporate parent information, as well as OECA and others who need corporate parent information to support analysis. • White House Corporate ID Workgroup Collaborate with emergent White House Corporate ID workgroup – Beth Noveck / Steve Croley, SEC, Labor and other agencies to align, coordinate and collaborate on corporate identifiers • OpenGov Collaboration with EPA Open Gov initiatives to inform on how best to publish data for external reuse. • National Academy of Public Administration Follow-through on 2005 NAPA Report recommendations • Spatial Ontology Community of Practices (SOCOP) Collaboration on vocabularies, standards and data modeling approaches • Data.Gov Data Architecture Subgroup Collaboration on vocabularies, standards and data modeling approaches • EPA OEI/OIC/IESD Data Standards Branch Collaboration on vocabularies, standards and data modeling approaches • Others… Anticipated Next Steps: TBD, develop ideas for potential pilots, engage on “LOD Cookbook” and approaches for representing and rendering our data as RDF. 11