SlideShare una empresa de Scribd logo
1 de 60
healthdata.gov
    now and next
 challenges overview

hhs ocio, health datapalooza 2012
session agenda
•   now
    –   tools and features


•   next
    –   target architecture


•   challenges
    –   explanations in sequence


                              1
now – tools and features
•   Drupal
    –   publishing workflow and community engagement
•   Solr
    –   faceted search
•   CKAN
    –   „on demand resources‟ (RESTful API and feeds)
•   EC2
    –   powered by GovCloud
•   github.com/hhs
    –   public repo‟s coming soon!

                             2
publishing workbench
•   insert interesting workbench screenshot




                       3
community engagement
•   insert interesting community engagement
    screenshot
•   question and/or ideas example




                      4
faceted search




      5
hub.healthdata.gov/api/rest/dataset


                           step 1:
                         HTTP GET
                         /dataset
                         collection
                          as JSON
                          (GUID or name)




                 6
hub.healthdata.gov/api/rest/dataset/{name}



                               step 2:
                              HTTP GET
                                each
                              /dataset
                            (as JSON, RDF/XML, or
                                     N3)




                    7
hub.healthdata.gov/api/search/dataset?q=medicare+costs




                                       JSON
                                     results for
                                    „medicare‟
                                    and „costs‟
                                      search
                                       query


                          8
hub.healthdata.gov/feeds/dataset.atom



                          atom feed
                            for all
                           datasets
                           (including recent
                             updates and
                               changes)




                  9
hub.healthdata.gov/feeds/custom.atom?q=medicare+cost




                                     custom
                                     search
                                      query
                                   result atom
                                      feed
                                     (anything with
                                    „medicare+cost‟)



                         10
next – target architecture
•   linked data
    –   (closed) google knowledge graph
    –   open health knowledge graph
•   integration framework
    –   top down modeling
    –   bottom up mapping
    –   social curation




                            11
#gkg – (closed) ‘things, not strings’



     “The Knowledge Graph helps us
         understand the relationships
    between things [… that are] linked
        in our graph. […] It‟s not just a
     catalog of objects; it also models
    all these inter-relationships.” source




                               12
open health knowledge graph




             13
health.data.gov/id/hospital/393303




                14
clinical quality linked data (HDI II)




                 15
lifting and enrichment




          16
Linked Data Integration Framework
                    GKG/Watson/Siri/…        healthdata.gov


                                                   PCAST DEAS



                                                      HKG




                Variety
                Volume
                Velocity




Health Data Actor
                                        17
social meta/data – graph curation




                18
i2 challenges
• two types
  – three domain specific
     • improve the integration and liquidity of data made available
  – four platform specific
     • enhance the capabilities of the technology components


• 3 release rounds
  – sequenced to leverage dependencies
     • round 1: June through October 2102
     • round 2: November 2012 through May 2013
     • round 3: June through December 2013

                                19
round 1 challenges
• June 2012 through October 2012

  – domain specific
     • [1.1] cross domain and domain specific metadata
         – voluntary consensus standards organizations, defacto
           standards, other


  – platform specific
     • [1.2] Simplified Sign On (SSO)
         – WebID identity provider and relying parties, HDP infrastructure
           components


  – $35K: $20K 1st, $10K 2nd, $5K 3rd place prizes
                                  20
round 2 challenges
• November 2012 through May 2013

  – domain specific
     • [2.3] Mapping, Reconciliation and Correlation
         – structural variety, authoritative URI‟s, linking heuristics


  – platform specific
     • [2.4] Faceted Browsing and Visualization
         – D3 (backbone, jQuery, etc.)
     • [2.5] Custom API
         – Linked Data API „configurator‟ for dataset resources


             » each of these builds on [1.1] results

                                    21
round 3 challenges
• June 2013 through December 2013

  – domain specific
     • [3.6] Correlating HHS and NHS Classifications
        – structural variety, authoritative URI‟s, linking heuristics


  – platform specific
     • [3.7] Linked Data API based Data Element Access Services
        – „securing the data, not just the device‟
             » builds on [1.1], [1.2], and [2.5]




                                   22
domain challenge [1.1]
• Metadata
  – requests the application of existing voluntary
    consensus standards for metadata common to all
    open government data
  – and invites new designs for health domain specific
    metadata to classify datasets in our growing catalog,
    creating entities, attributes and relations
  – that form the foundations for better discovery,
    integration and liquidity.


• 374 on challenge.gov

                           23
W3C SKOS – concept schemes




            24
W3C DCAT – data catalogs




           25
hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf




                                            rdf/xml
                                          output uses
                                          dublin core
                                           and dcat
                                          metadata
                                         (mapping issues to work
                                            out, N3 output is
                                           incomplete, etc.)

                               26
https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf




                                            ckan script
                                           that creates
                                           dc and dcat
                                            metadata
                                           tags / values
                                             (thanks @JoshData!
                                              public github repo
                                                    soon :-)

                                  27
W3C Data Cube – statistics




refactor CQLD
vocabs/data?
start here and
 follow imports

                  28
W3C Provenance – change mgmt




                    apply to CKAN
                      /revisions




             29
hub.healthdata.gov/revision




             30
W3C org – organization




          31
quantity, units, dimensions, time




                32
OGC GeoSPARQL – geospatial




            33
OMG BMM – business motivation




              34        image source
CQLD domain specific




         35
platform challenge [1.2]
• WebID based SSO
  – will improve community engagement
  – by providing simplified sign on (SSO) for external
    users interacting across multiple HDP technology
    components,
  – making it easier for community collaborators to
    contribute,
  – leveraging new approaches to decentralized
    authentication.


• 375 on challenge.gov
                           36
relying party WebID login




            37
identity provider WebID login




              38
edit WebID property ACL at IdP




              39
property is now visible to the RP




                40
domain challenge [2.3]
• Mapping, Reconciliation and Correlation
   – builds on the Metadata domain challenge [1.1]
   – begins by acknowledging disparate open government publishing
     practices
   – and seeks the demonstration of an innovative and automated
     solution for transforming semi-structured data into structured data,
   – reconciles decentralized distributions about the same data entity
     against the master identity of an authoritative source,
   – and correlates these master identities when multiple authoritative
     sources exist,
   – enabling the network effect by introducing strong identity resolution
     techniques that ease the ability to aggregate different data about
     the same entities from independent publishers.


                                  41
automating structural transformations




                 42
‘reconciling’ strings to things




               43
result: turtle is the new JSON!




              44
link automation heuristics editor




                45
platform challenge [2.4]
• Faceted Browsing and Visualization
   – builds on the Metadata domain challenge [1.1]
   – uses the most popular browser based UI frameworks and libraries
     to realize novel exploration and discovery techniques for traversing
     large amounts of interrelated data,
   – contributing to a growing collection of open source widgets that
     make it easy for third parties to create new applications and embed
     health data in their content.




                                 46
surfing the domain schemata




 no domain knowledge
  required to discover
entities and relationships
                 47
agents construct e/r queries




Siri, which {LA County}
Hospitals have the best
 {Heart Attack} stats?
               48
d3 (jQuery, backbone, etc.)




             49
platform challenge [2.5]
• Custom API
  – also builds on the Metadata domain challenge [1.1]
  – makes it possible to tune programmatic access in accordance
    with dataset metadata, leveraging an existing „Web 3.0‟
    framework and Linked Data API (LDA) implementation to provide
    specialized interfaces




                              50
a ‘Web 3.0’ API ‘configurator’

• Linked Data API (LDA)
 – http://code.google.com/p/linked-data-api/
   •   open source impl here
       –   http://code.google.com/p/puelia-php/
   •   example usage here
       –   http://reference.data.gov.uk/doc/department
   •   example api reference docs here
       –   http://environment.data.gov.uk/lab/doc/api-bwq-reference-
           v0.2.html
   •   commercialization example here
       –   http://kasabi.com/tour


                               51
domain challenge [3.6]
• Correlating HHS – NHS Classifications
   – builds on both the Metadata [1.1] and Mapping, Reconciliation and
     Correlation [2.3] domain challenges,
   – and uses the US and UK health domain specific classification
     schemes to exercise the capabilities demonstrated by the
     automated solution to [2.3],
   – resulting in better international integration of frameworks for
     understanding societal outcomes and their corresponding health
     statistics.




                                52
platform challenge [3.7]
• Linked Data API based Data Element Access Services
   – builds on the Metadata domain challenge [1.1], and the Web ID
     based SSO [1.2], and Custom API [2.5] platform challenges
   – augmenting WebID based authentication with metadata driven
     authorization,
   – introducing an innovative security and privacy implementation of
     „data element access services‟ (DEAS) as described by the PCAST
     Health IT Report,
   – resulting in a Custom API configured by domain specific metadata
     that governs fine grained access to provide the right data to the
     right user.


• „secure the data, not just the devices‟
                                53
LDA + PPO = DEAS




       54
Privacy Preference Ontology (PPO)




                55
user 1 AuthZ ‘1101’ all attributes




                56
multiple machine readable formats




                57
user 2 AuthZ ‘1101’ no attributes




                58
thanks!
@prefix drm: <http://vocab.data.gov/def/drm#>
@prefix sdo: <http://schema.org/>
@prefix vcard: <http://www.w3.org/2006/vcard/ns#>
@prefix dc: <http://purl.org/dc/terms/>


<http://hhs.gov/staff/georgethomas#>
    rdf:type drm:DataSteward , sdo:Person ;
    vcard:email “george dot thomas 1 at hhs dot gov” ;
    dc:contributor <healthdata.gov>, <data.gov/semantic> .

                           59

Más contenido relacionado

Destacado

US-UK HHS-NHS Summit
US-UK HHS-NHS SummitUS-UK HHS-NHS Summit
US-UK HHS-NHS SummitGeorge Thomas
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010George Thomas
 
CQLD on health.data.gov @ SemTech 2011
CQLD on health.data.gov @ SemTech 2011CQLD on health.data.gov @ SemTech 2011
CQLD on health.data.gov @ SemTech 2011George Thomas
 
Realizing the GPRAMA using Government Linked Data
Realizing the GPRAMA using Government Linked DataRealizing the GPRAMA using Government Linked Data
Realizing the GPRAMA using Government Linked DataGeorge Thomas
 
HealthData.gov Challenge Webinar
HealthData.gov Challenge WebinarHealthData.gov Challenge Webinar
HealthData.gov Challenge WebinarGeorge Thomas
 
Clinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govClinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govGeorge Thomas
 
Open Health Knowledge Graphs
Open Health Knowledge GraphsOpen Health Knowledge Graphs
Open Health Knowledge GraphsGeorge Thomas
 

Destacado (9)

US-UK HHS-NHS Summit
US-UK HHS-NHS SummitUS-UK HHS-NHS Summit
US-UK HHS-NHS Summit
 
Gt ea2009
Gt ea2009Gt ea2009
Gt ea2009
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
 
Paul klee
Paul kleePaul klee
Paul klee
 
CQLD on health.data.gov @ SemTech 2011
CQLD on health.data.gov @ SemTech 2011CQLD on health.data.gov @ SemTech 2011
CQLD on health.data.gov @ SemTech 2011
 
Realizing the GPRAMA using Government Linked Data
Realizing the GPRAMA using Government Linked DataRealizing the GPRAMA using Government Linked Data
Realizing the GPRAMA using Government Linked Data
 
HealthData.gov Challenge Webinar
HealthData.gov Challenge WebinarHealthData.gov Challenge Webinar
HealthData.gov Challenge Webinar
 
Clinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govClinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.gov
 
Open Health Knowledge Graphs
Open Health Knowledge GraphsOpen Health Knowledge Graphs
Open Health Knowledge Graphs
 

Similar a HDI III - Healthdata.gov - Now, Next and Challenges

Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedRensselaer Polytechnic Institute
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsNeo4j
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkIRJET Journal
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesStefan Dietze
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataEUCLID project
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)Christophe Debruyne
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive trackGeorge Komatsoulis
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
Graphical display of statistical data on Android
Graphical display of statistical data on AndroidGraphical display of statistical data on Android
Graphical display of statistical data on AndroidDidac Montero
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatchdata publica
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
 

Similar a HDI III - Healthdata.gov - Now, Next and Challenges (20)

Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; Repositories
 
An Introduction to CCDH
An Introduction to CCDHAn Introduction to CCDH
An Introduction to CCDH
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
Graphical display of statistical data on Android
Graphical display of statistical data on AndroidGraphical display of statistical data on Android
Graphical display of statistical data on Android
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatch
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 

Más de George Thomas

Gt health2stat 7-22-2010
Gt health2stat 7-22-2010Gt health2stat 7-22-2010
Gt health2stat 7-22-2010George Thomas
 
Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...George Thomas
 
(More) Transparency Transformation
(More) Transparency Transformation(More) Transparency Transformation
(More) Transparency TransformationGeorge Thomas
 
Transparency Transformation
Transparency TransformationTransparency Transformation
Transparency TransformationGeorge Thomas
 
Office 2.0 at GSA OCIO Offsite
Office 2.0 at GSA OCIO OffsiteOffice 2.0 at GSA OCIO Offsite
Office 2.0 at GSA OCIO OffsiteGeorge Thomas
 

Más de George Thomas (8)

Learn by doing
Learn by doingLearn by doing
Learn by doing
 
Gt health2stat 7-22-2010
Gt health2stat 7-22-2010Gt health2stat 7-22-2010
Gt health2stat 7-22-2010
 
Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...Implementing the Open Government Directive using the technologies of the Soci...
Implementing the Open Government Directive using the technologies of the Soci...
 
The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
 
(More) Transparency Transformation
(More) Transparency Transformation(More) Transparency Transformation
(More) Transparency Transformation
 
Recovery.Gov
Recovery.GovRecovery.Gov
Recovery.Gov
 
Transparency Transformation
Transparency TransformationTransparency Transformation
Transparency Transformation
 
Office 2.0 at GSA OCIO Offsite
Office 2.0 at GSA OCIO OffsiteOffice 2.0 at GSA OCIO Offsite
Office 2.0 at GSA OCIO Offsite
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

HDI III - Healthdata.gov - Now, Next and Challenges

  • 1. healthdata.gov now and next challenges overview hhs ocio, health datapalooza 2012
  • 2. session agenda • now – tools and features • next – target architecture • challenges – explanations in sequence 1
  • 3. now – tools and features • Drupal – publishing workflow and community engagement • Solr – faceted search • CKAN – „on demand resources‟ (RESTful API and feeds) • EC2 – powered by GovCloud • github.com/hhs – public repo‟s coming soon! 2
  • 4. publishing workbench • insert interesting workbench screenshot 3
  • 5. community engagement • insert interesting community engagement screenshot • question and/or ideas example 4
  • 7. hub.healthdata.gov/api/rest/dataset step 1: HTTP GET /dataset collection as JSON (GUID or name) 6
  • 8. hub.healthdata.gov/api/rest/dataset/{name} step 2: HTTP GET each /dataset (as JSON, RDF/XML, or N3) 7
  • 9. hub.healthdata.gov/api/search/dataset?q=medicare+costs JSON results for „medicare‟ and „costs‟ search query 8
  • 10. hub.healthdata.gov/feeds/dataset.atom atom feed for all datasets (including recent updates and changes) 9
  • 11. hub.healthdata.gov/feeds/custom.atom?q=medicare+cost custom search query result atom feed (anything with „medicare+cost‟) 10
  • 12. next – target architecture • linked data – (closed) google knowledge graph – open health knowledge graph • integration framework – top down modeling – bottom up mapping – social curation 11
  • 13. #gkg – (closed) ‘things, not strings’ “The Knowledge Graph helps us understand the relationships between things [… that are] linked in our graph. […] It‟s not just a catalog of objects; it also models all these inter-relationships.” source 12
  • 16. clinical quality linked data (HDI II) 15
  • 18. Linked Data Integration Framework GKG/Watson/Siri/… healthdata.gov PCAST DEAS HKG Variety Volume Velocity Health Data Actor 17
  • 19. social meta/data – graph curation 18
  • 20. i2 challenges • two types – three domain specific • improve the integration and liquidity of data made available – four platform specific • enhance the capabilities of the technology components • 3 release rounds – sequenced to leverage dependencies • round 1: June through October 2102 • round 2: November 2012 through May 2013 • round 3: June through December 2013 19
  • 21. round 1 challenges • June 2012 through October 2012 – domain specific • [1.1] cross domain and domain specific metadata – voluntary consensus standards organizations, defacto standards, other – platform specific • [1.2] Simplified Sign On (SSO) – WebID identity provider and relying parties, HDP infrastructure components – $35K: $20K 1st, $10K 2nd, $5K 3rd place prizes 20
  • 22. round 2 challenges • November 2012 through May 2013 – domain specific • [2.3] Mapping, Reconciliation and Correlation – structural variety, authoritative URI‟s, linking heuristics – platform specific • [2.4] Faceted Browsing and Visualization – D3 (backbone, jQuery, etc.) • [2.5] Custom API – Linked Data API „configurator‟ for dataset resources » each of these builds on [1.1] results 21
  • 23. round 3 challenges • June 2013 through December 2013 – domain specific • [3.6] Correlating HHS and NHS Classifications – structural variety, authoritative URI‟s, linking heuristics – platform specific • [3.7] Linked Data API based Data Element Access Services – „securing the data, not just the device‟ » builds on [1.1], [1.2], and [2.5] 22
  • 24. domain challenge [1.1] • Metadata – requests the application of existing voluntary consensus standards for metadata common to all open government data – and invites new designs for health domain specific metadata to classify datasets in our growing catalog, creating entities, attributes and relations – that form the foundations for better discovery, integration and liquidity. • 374 on challenge.gov 23
  • 25. W3C SKOS – concept schemes 24
  • 26. W3C DCAT – data catalogs 25
  • 27. hub.healthdata.gov/dataset/hospice-medicare-cost-report-data.rdf rdf/xml output uses dublin core and dcat metadata (mapping issues to work out, N3 output is incomplete, etc.) 26
  • 28. https://github.com/HHS/hd2-ckan/blob/master/templates/package/read.rdf ckan script that creates dc and dcat metadata tags / values (thanks @JoshData! public github repo soon :-) 27
  • 29. W3C Data Cube – statistics refactor CQLD vocabs/data? start here and follow imports 28
  • 30. W3C Provenance – change mgmt apply to CKAN /revisions 29
  • 32. W3C org – organization 31
  • 34. OGC GeoSPARQL – geospatial 33
  • 35. OMG BMM – business motivation 34 image source
  • 37. platform challenge [1.2] • WebID based SSO – will improve community engagement – by providing simplified sign on (SSO) for external users interacting across multiple HDP technology components, – making it easier for community collaborators to contribute, – leveraging new approaches to decentralized authentication. • 375 on challenge.gov 36
  • 40. edit WebID property ACL at IdP 39
  • 41. property is now visible to the RP 40
  • 42. domain challenge [2.3] • Mapping, Reconciliation and Correlation – builds on the Metadata domain challenge [1.1] – begins by acknowledging disparate open government publishing practices – and seeks the demonstration of an innovative and automated solution for transforming semi-structured data into structured data, – reconciles decentralized distributions about the same data entity against the master identity of an authoritative source, – and correlates these master identities when multiple authoritative sources exist, – enabling the network effect by introducing strong identity resolution techniques that ease the ability to aggregate different data about the same entities from independent publishers. 41
  • 45. result: turtle is the new JSON! 44
  • 47. platform challenge [2.4] • Faceted Browsing and Visualization – builds on the Metadata domain challenge [1.1] – uses the most popular browser based UI frameworks and libraries to realize novel exploration and discovery techniques for traversing large amounts of interrelated data, – contributing to a growing collection of open source widgets that make it easy for third parties to create new applications and embed health data in their content. 46
  • 48. surfing the domain schemata no domain knowledge required to discover entities and relationships 47
  • 49. agents construct e/r queries Siri, which {LA County} Hospitals have the best {Heart Attack} stats? 48
  • 51. platform challenge [2.5] • Custom API – also builds on the Metadata domain challenge [1.1] – makes it possible to tune programmatic access in accordance with dataset metadata, leveraging an existing „Web 3.0‟ framework and Linked Data API (LDA) implementation to provide specialized interfaces 50
  • 52. a ‘Web 3.0’ API ‘configurator’ • Linked Data API (LDA) – http://code.google.com/p/linked-data-api/ • open source impl here – http://code.google.com/p/puelia-php/ • example usage here – http://reference.data.gov.uk/doc/department • example api reference docs here – http://environment.data.gov.uk/lab/doc/api-bwq-reference- v0.2.html • commercialization example here – http://kasabi.com/tour 51
  • 53. domain challenge [3.6] • Correlating HHS – NHS Classifications – builds on both the Metadata [1.1] and Mapping, Reconciliation and Correlation [2.3] domain challenges, – and uses the US and UK health domain specific classification schemes to exercise the capabilities demonstrated by the automated solution to [2.3], – resulting in better international integration of frameworks for understanding societal outcomes and their corresponding health statistics. 52
  • 54. platform challenge [3.7] • Linked Data API based Data Element Access Services – builds on the Metadata domain challenge [1.1], and the Web ID based SSO [1.2], and Custom API [2.5] platform challenges – augmenting WebID based authentication with metadata driven authorization, – introducing an innovative security and privacy implementation of „data element access services‟ (DEAS) as described by the PCAST Health IT Report, – resulting in a Custom API configured by domain specific metadata that governs fine grained access to provide the right data to the right user. • „secure the data, not just the devices‟ 53
  • 55. LDA + PPO = DEAS 54
  • 57. user 1 AuthZ ‘1101’ all attributes 56
  • 59. user 2 AuthZ ‘1101’ no attributes 58
  • 60. thanks! @prefix drm: <http://vocab.data.gov/def/drm#> @prefix sdo: <http://schema.org/> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> @prefix dc: <http://purl.org/dc/terms/> <http://hhs.gov/staff/georgethomas#> rdf:type drm:DataSteward , sdo:Person ; vcard:email “george dot thomas 1 at hhs dot gov” ; dc:contributor <healthdata.gov>, <data.gov/semantic> . 59