SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Towards a brokering framework
                  for knowledge-based services:

                  Learning from the Pistoia Alliance
                  SESL pilot

Ian Harrow, PhD
Co-Leader of Pistoia Alliance SESL pilot (ex-Pfizer)
Founder, Director & Principal Consultant at Ian Harrow Consulting Ltd



Bio IT World, Hanover, October 2011


http://pistoiaalliance.org
Outline

• Industry Drivers
• Mission and Strategy of Pistoia
• Vision for the SESL pilot
• Minimal configuration to test a
  brokering service
• Public demonstrator and standards
• Deliverables achieved by SESL pilot
• Learning and future direction
                                        2
What is Core to your Business?
       What is Critical?
                     Core?

                         Externalize
              Focus
                             for        1990
             Staff on
                            Best
Critical?




            Innovation
                          Practices


                                        2012
              Reduce     Externalize
             Non-Value     for Cost
            Added Work    Reduction


                                           3
Why the Pistoia Alliance?

• Industry was at a cross roads          Henry Chesbrough, UC Berlkey 2011


  – Change in business models required
• We are all in this (mess) together (Life Science,
  technology vendors, service IT, academia, etc.)
• Need industry applicable services and
  standards
• Collect all the stakeholders together
  – Agree on commonly-shared, pre-competitive use
    cases
• Focus on delivery of proofs of concept to
  stimulate and foster new business models
                                                                             4
The Mission of the Pistoia Alliance



Lowering the barriers to innovation
by improving the interoperability of
R&D business processes
via pre-competitive collaborations


                                       5
6
Pistoia Alliance Membership   Sept 2011




                                      7
A Reality Check: Setting Expectations




                            

                            
                                        8
Signpost
clearly


           9
Pistoia
Strategy


           10
Domains of Action


   Biology &
 Translational       Chemistry
   Medicine



             Scientific
           Collaboration
                                 11
The Focus of Each Domain


   Big Data,
                             Supply Chain,
  Analytics,
                             Tech Transfer
  Semantics
     Biology                          Chemistry


               Vocabularies,
                Use Cases,
               Best Practices
           Scientific Collaboration               12
Try this at your desk….


Which diseases are correlated to the gene, TCF7L2?




    Gene/Protein     Literature - Abstracts Literature – Full Text




          Inherited diseases        Gene expression
                                                                     13
Try it again with Pistoia’s SESL….

                Gene naming/synonyms
                Gene Function
                Literature statistics
                Disease co-occurrences
                Gene/protein interactions

                …all in one report from one
                search

                HOW? A standard vocabulary,
                data model, query language,
                report structure, etc.
                                              14
SESL Pilot project description
• Deliverables:
   – Publication of standards and recommendations for brokering service
     implementation
   – Public demonstrator service for a single disease area
   – Dialogue and assessment of potential business impact with key content
     suppliers

• Scope:
   – Development of an assertion database in combination with a user
     interface and associated web services for one
     disease/indication/phenotype of broad interest: Type II Diabetes
   – Assertional content derived from 3 structured data sources and limited
     Journal content (co-occurrence and statistical derivation from full text)
   – Assertional evidence for filtering and drill down to primary data.
   – Limited vocabulary development for area of focus: Type II Diabetes

• Participants and Cost:
   –   AZ, Pfizer, GSK, Roche, Unilever, EMBL-EBI, NPG, OUP, Elsevier & RSC
   –   Single contract between Pistoia Alliance & EMBL-EBI
   –   £200K cost (=2 x FTEs) – shared by industry
   –   12 month project, January 2010 start
                                                                                 15
The Knowledge Service Framework

                                                                          Multiple
                                                                          Consumers

‘Consumer’
                                  Disease Dossier                         Knowledge
                                                                          Applications
Firewall
             Service Layer                                  Std Public
                                                                          Common
Open         Assertion & Meta Data Management              Vocabularies
                                                                          Service
Stand        Transform /Translate (RDF triples)             Business      Broker
-ards        Integrator/Aggregator (Triple store)            Rules
Supplier
Firewall                                                                    Content
                                                                            Suppliers
                         Db 2

                                                    Db 4
           Corpus 1
                                     Db 3                    Corpus 5
                                            16
                                                                                         16
Minimal configuration to test the technical
 feasibility of a Knowledge Broker Service


                                                                                                                     Interface
                                                     User Interface                                                  Layer



                       Service Layer                 Std Public         Service Layer                  Std Public
Condition:
                                                                                                                     Brokering service
                                                    Vocabularies                                      Vocabularies
                       Assertion & Meta Data Mgmt                       Assertion & Meta Data Mgmt
Identical structure.
                       Transform / Translate          Query             Transform / Translate           Query
Different content
which can overlap      Triple store 1
                                                    templates
                                                                        Triple store 2
                                                                                                      templates      Layer
                              Broker #1                                       Broker #2




                                                                                                                     Primary source
                                                                                                                     Layer
                                                             RSC
                                   UK-Pubmed                                                          NPG    OUP
                                                            corpus
                                     Central                                                         corpus corpus
                EBI Uniprot          corpus                          EBI Array       EBI Uniprot
                 database                                            Express          database
                                                Elsevier             database
                         NCBI OMIM              corpus
                          database                                                                                                    17
Simple Graphical User Interface to the
    SESL public demonstrator

1. Single point of query through a simple GUI        2. Aggregated Results on a single web page

                                                                                              Full text detail
                                                       A. Gene query results summary
                                                                                              Title: Authors:
                                                       1) Co-occurrence Documents             Citation
                                                       2) Uniprot names and annotation        Co-occurrence of
                                                       3) OMIM disease names                  gene and disease
                                                       4) Array express disease and/or        mentions in text
                                                          pancreas expression                 extracts
                                                       5) Uniprot GO terms
                                                       6) Uniprot Binary interactions
                                  A. Gene Query
             Show:                 and/or                   The results include links out to the primary sources

                                  B. Disease Query                                            Full text detail
                                                       B. Disease query results summary
                                                                                              Title: Authors:
                                                       1) Co-occurrence Documents             Citation
                                                       2) OMIM disease names                  Co-occurrence of
                                                       3) Array express disease expression    gene and disease
  Filtered by:
  1) Everything                                                                               mentions in text
                                                                                              extracts
  2) Consensus
  3) Co-occurrence
  4) OMIM
  5) Array Express    SESL public demonstrator:
                                       http://www.pistoia-sesl.org
                                                                                                                   18
Type 2 diabetes genes in SESL demonstrator

Human protein names                   Human      Source: SESL:      Google Pubmed: SESL: gene Source: SESL:      Source:   SESL:  Source: SESL: GO Source:         SESL:
                                      gene       UniProt UniProt Scholar: type 2 and type 2 OMIM OMIM             Array   Array   Uniprot terms     Uniprot       Binary
                                      names      diabetes diabetes type 2 diabetes diabetes diabetes diabetes Express Express GO terms               Intact    interactions
                                                 mention mention diabetes June         co-      mention mention   Atlas  pancreas                    binary
                                                                   2006 to   2011 occurrence                    pancreas                          interactions
                                                                  June 2011        in Full Text
ATP-binding cassette sub-family C     ABCC8         1        1       753      37        6        6        6        5         7         7        9        0           0
member 8
Calpain-10                            CAPN10        1        1       810      168       21       1        1        1         1        12       12        0           0
Glucokinase                           GCK           1        1      3,950     708       12       7        7        0         0        19       19        2           2
Hematopoietically-expressed           HHEX          0        0       626      91        24       1        0        2         2        21       23        3           0
homeobox protein
Hepatocyte nuclear factor 1-alpha     HNF1A         1        1       633      340       23       3        4        2         2        12       12        6           6
Hepatocyte nuclear factor 1-beta      HNF1B         1        1       408      269       20       1        1        2         2         9        8        1           0
Hepatocyte nuclear factor 4-alpha     HNF4A         1        1       811      173       34       2        2        3         3        22       20        5           5
Insulin                               INS           2        1     166,000   37,670     5        9        0        7         0        59       59        0           0
Insulin receptor substrate 1          IRS1          1        1      7,970     616       9        1        0        2         2        24       24        3           0
Insulin receptor                      INSR          1        1      14,00    4,830      16       2        4        6         6        41       43        9           9
ATP-sensitive inward rectifier        KCNJ11        1        1      1,260      45       35       3        1        0         0        12       12        1           0
potassium channel 11
Hepatic triacylglycerol lipase        LIPC          1        0      2,090     89        1        1        1        1         1        17       17        0           0
C-Jun-amino-terminal kinase-          MAPK8IP1      1        1       248       4        1        1        1        1         1         6        6        4           4
interacting protein 1
Neurogenic differentiation factor 1   NEUROD1       1        1       549      50        7        2        2        2         4        13       14        0           0
Pancreas/duodenum homeobox            PDX1          1        1      2,270     154       9        2        0        1         1         9        9        0           0
protein 1
Peroxisome proliferator-activated     PPARG         1        1      9,540    1,556      48       1        1        2         2        40       42        7           7
receptor gamma
Protein phosphatase 1 regulatory      PPP1R3A       1        1       141      23        3        1        0        1         1         2        2        0           0
subunit 3A
Zinc transporter 8                    SLC30A8       1        0       724      117       0        2        1        3         4        13       13        0           0
Transcription factor 7-like 2         TCF7L2        1        1      2,000     284       65       1        1        3         3        33       31        5           5
Mitochondrial brown fat uncoupling    UCP1          1        0      1,760     50        3        0        0        0         0         6        6        0           0
protein 1

                                                                                                                                                                         19
Gene discovery in SESL demonstrator



    Pancreas                   T2D disease
                      1               gene
    expression
    in Array                       mention
    Express db                  in OMIM db

                  3        1                 Gene count
        20            10           0
                           3
                                             intersections from
                  4
                                             the data sources in
                                             the demonstrator
    T2D disease                T2D disease
    genes in                          gene
    Full Text         1         mention in
    documents                   Uniprot db



                                                                   20
Selected content loaded as RDF triples

 Source Description                                # triples    %
 Expression data Array Express                        182,840    0.5%

 Experimental Factor Ontology from Array Express      49,026     0.1%

 Disease vocabulary from UMLS                       6,906,735   18.8%

 Vocabulary from Disease Ontology                   1,863,664    5.1%

 Terms from Gene Ontology                            495,595     1.3%

 Human genes from Uniprot                          12,552,239   34.1%

 Meta data from Full Text documents                 3,485,212    9.5%

 Gene annotations from Full Text documents          2,373,584    6.5%

 Disease annotations from Full Text documents       4,983,788   13.6%

 GO annotations from Full Text documents            3,870,834   10.5%

 Totals                                            36,763,517   100%

                                                                        21
Signposting: Standards used in SESL

  Category              Name                  Community
                         RDF                     W3C
                       SPARQL                    W3C
Triple Store        Jena, Sesame,
                                             Open Source
                       Virtuoso
                        leXML                 EBI & CALBC
                                           EBI, NaCTeM, U of
Text Mining        LexEBI/BioLexicon
                                                  Pisa
                         CALCBC               EBI & CALBC
                         UniProt           EBI, PIR, SBI, etc
               Disease Ontology and UMLS    OBO, NIH/NLM
                                                                Blending of
   URIs               ArrayExpress                 EBI          existing
                     NCBI Taxonomy                NCBI          standards
                       Dublin Core                W3C
                       N3 notation                W3C
RDF Schema       Co-occurrence of gene-
                                                  EBI
                         disease
                    PMC doc standard             NCBI
                    Relation ontology            OBO
 Ontology               URI server               W3C
                                                                              22
The Deliverables of the SESL pilot

• A proof-of-concept to demonstrate feasibility and
  clarify requirements
  – http://www.pistoia-sesl.org
• A functional specification for query brokering,
  result filtering, report generation
  – Expect publication by end 2011
  – http://www.pistoiaalliance.com/workinggroups/sesl.html

• Academia, Life Science Industry and Publishers
  – Attained a better understanding of each other’s needs
  – Demonstration of potential for a new business model
  – Explore follow-on via Open Innovation consortia
                                                             23
Learning and Future Direction

• Framework to maximise re-use of existing standards
  – Minimise use of bespoke, hard-coded implementations
• Crucial features of a knowledge brokering service:-
  – RDF triples for a scalable, meta index to broker across
    primary sources (both databases and literature)
  – Important to define business rules for query & extraction
  – Recommend a registry of suitable data sources
     • similar to web services registry
• What is next?
  –   Example, follow-on to the SESL pilot:-
  –   Open PHACTs consortium => www.openphacts.org
  –   3 year IMI pre-competitive project (started early 2011)
  –   Data providers and Life Science industry working together   24
Acknowledgements

Industry                       EMBL-EBI                     Publishers
Wendy Filsell - Unilever       Dietrich Rebholz Schuhmann   Claire Bird – OUP
(SESL co-leader)               (Technical Team Leader)      Richard O’Bierne – OUP
Ian Stott - Unilever           Christoph Grabmueller
                               Silvestras Kavaliauskas      Colin Batchelor – RSC
Nigel Wilkinson - PFE                                       Richard Kidd – RSC
Catherine Marshall - PFE       Dominic Clark
                               Roderigo Lopez               David Hoole – NPG
Peter Woollard - GSK           Jo McEntyre – UK-PMC         Alf Eaton – NGP
Ashley George - GSK            Janet Thornton
                                                            Jabe Wilson – Elsevier
Mike Westaway - AZ                                          Bradley Allen – Elsevier
Nick Lynch - AZ
Ian Dix - AZ

Michael Braxenthaler – Roche

John Wise – Pistoia Alliance
                                                                                       25

Más contenido relacionado

Destacado

Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...Ian Harrow
 
Data Integration Score Card
Data Integration Score CardData Integration Score Card
Data Integration Score CardSciBite Limited
 
SciBite overview July 2013
SciBite overview July 2013SciBite overview July 2013
SciBite overview July 2013SciBite Limited
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutJosef Scheiber
 
AllegroGraph - AGWebView
AllegroGraph - AGWebViewAllegroGraph - AGWebView
AllegroGraph - AGWebViewCraig Norvell
 
Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglySciBite Limited
 
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance
 
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite Limited
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesJosef Scheiber
 
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneMobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneJosef Scheiber
 

Destacado (13)

Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
 
Data Integration Score Card
Data Integration Score CardData Integration Score Card
Data Integration Score Card
 
Scibite - We Do.
Scibite - We Do.Scibite - We Do.
Scibite - We Do.
 
SciBite overview July 2013
SciBite overview July 2013SciBite overview July 2013
SciBite overview July 2013
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in Noordwijkerhout
 
AllegroGraph - AGWebView
AllegroGraph - AGWebViewAllegroGraph - AGWebView
AllegroGraph - AGWebView
 
Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The Ugly
 
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
 
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneMobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
 

Similar a Pistoia Alliance SESL pilot Bio IT World Hanover 12 Oct 2011

Enterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesEnterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesDataWorks Summit
 
MarkLogic Applications in Healthcare
MarkLogic Applications in HealthcareMarkLogic Applications in Healthcare
MarkLogic Applications in HealthcareTony Agresta
 
Monolix Day 2011
Monolix Day 2011Monolix Day 2011
Monolix Day 2011blaudez
 
PCTY 2012, Risk Based Access Control v. Pat Wardrop
PCTY 2012, Risk Based Access Control v. Pat WardropPCTY 2012, Risk Based Access Control v. Pat Wardrop
PCTY 2012, Risk Based Access Control v. Pat WardropIBM Danmark
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopHortonworks
 
Cheng bearing point
Cheng bearing pointCheng bearing point
Cheng bearing pointsouthmos
 
Results of the Apollon pilot in homecare and independent living
Results of the Apollon pilot in homecare and independent livingResults of the Apollon pilot in homecare and independent living
Results of the Apollon pilot in homecare and independent livingimec.archive
 
Reflections on knowledge management practice case study
Reflections on knowledge management practice    case studyReflections on knowledge management practice    case study
Reflections on knowledge management practice case studyRichard Vines
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
Pistoia Alliance: SESL Pilot for a Biomedical Brokering Service
Pistoia Alliance: SESL Pilot for a Biomedical Brokering ServicePistoia Alliance: SESL Pilot for a Biomedical Brokering Service
Pistoia Alliance: SESL Pilot for a Biomedical Brokering ServiceIan Harrow
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...Stichting ePortfolio Support
 
SOA and Cloud in Life Sciences
SOA and Cloud in Life SciencesSOA and Cloud in Life Sciences
SOA and Cloud in Life SciencesSandeep Bhat
 
MITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationMITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationREMilk
 
Dorado Hybrid Cloud Use Case
Dorado Hybrid Cloud Use CaseDorado Hybrid Cloud Use Case
Dorado Hybrid Cloud Use CaseSVForum Cloud SIG
 

Similar a Pistoia Alliance SESL pilot Bio IT World Hanover 12 Oct 2011 (20)

Enterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesEnterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive Technologies
 
MarkLogic Applications in Healthcare
MarkLogic Applications in HealthcareMarkLogic Applications in Healthcare
MarkLogic Applications in Healthcare
 
Monolix Day 2011
Monolix Day 2011Monolix Day 2011
Monolix Day 2011
 
PCTY 2012, Risk Based Access Control v. Pat Wardrop
PCTY 2012, Risk Based Access Control v. Pat WardropPCTY 2012, Risk Based Access Control v. Pat Wardrop
PCTY 2012, Risk Based Access Control v. Pat Wardrop
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
IBM involvement in adoption of EHR, health data standards and epSOS - Matej Adam
IBM involvement in adoption of EHR, health data standards and epSOS - Matej AdamIBM involvement in adoption of EHR, health data standards and epSOS - Matej Adam
IBM involvement in adoption of EHR, health data standards and epSOS - Matej Adam
 
Cheng bearing point
Cheng bearing pointCheng bearing point
Cheng bearing point
 
Results of the Apollon pilot in homecare and independent living
Results of the Apollon pilot in homecare and independent livingResults of the Apollon pilot in homecare and independent living
Results of the Apollon pilot in homecare and independent living
 
MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)
 
Reflections on knowledge management practice case study
Reflections on knowledge management practice    case studyReflections on knowledge management practice    case study
Reflections on knowledge management practice case study
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Oepi external overview
Oepi external overviewOepi external overview
Oepi external overview
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Pistoia Alliance: SESL Pilot for a Biomedical Brokering Service
Pistoia Alliance: SESL Pilot for a Biomedical Brokering ServicePistoia Alliance: SESL Pilot for a Biomedical Brokering Service
Pistoia Alliance: SESL Pilot for a Biomedical Brokering Service
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
 
SOA and Cloud in Life Sciences
SOA and Cloud in Life SciencesSOA and Cloud in Life Sciences
SOA and Cloud in Life Sciences
 
SLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 PresentationSLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 Presentation
 
MITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationMITA Beyond MMIS Presentation
MITA Beyond MMIS Presentation
 
Dorado Hybrid Cloud Use Case
Dorado Hybrid Cloud Use CaseDorado Hybrid Cloud Use Case
Dorado Hybrid Cloud Use Case
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Pistoia Alliance SESL pilot Bio IT World Hanover 12 Oct 2011

  • 1. Towards a brokering framework for knowledge-based services: Learning from the Pistoia Alliance SESL pilot Ian Harrow, PhD Co-Leader of Pistoia Alliance SESL pilot (ex-Pfizer) Founder, Director & Principal Consultant at Ian Harrow Consulting Ltd Bio IT World, Hanover, October 2011 http://pistoiaalliance.org
  • 2. Outline • Industry Drivers • Mission and Strategy of Pistoia • Vision for the SESL pilot • Minimal configuration to test a brokering service • Public demonstrator and standards • Deliverables achieved by SESL pilot • Learning and future direction 2
  • 3. What is Core to your Business? What is Critical? Core? Externalize Focus for 1990 Staff on Best Critical? Innovation Practices 2012 Reduce Externalize Non-Value for Cost Added Work Reduction 3
  • 4. Why the Pistoia Alliance? • Industry was at a cross roads Henry Chesbrough, UC Berlkey 2011 – Change in business models required • We are all in this (mess) together (Life Science, technology vendors, service IT, academia, etc.) • Need industry applicable services and standards • Collect all the stakeholders together – Agree on commonly-shared, pre-competitive use cases • Focus on delivery of proofs of concept to stimulate and foster new business models 4
  • 5. The Mission of the Pistoia Alliance Lowering the barriers to innovation by improving the interoperability of R&D business processes via pre-competitive collaborations 5
  • 6. 6
  • 8. A Reality Check: Setting Expectations     8
  • 11. Domains of Action Biology & Translational Chemistry Medicine Scientific Collaboration 11
  • 12. The Focus of Each Domain Big Data, Supply Chain, Analytics, Tech Transfer Semantics Biology Chemistry Vocabularies, Use Cases, Best Practices Scientific Collaboration 12
  • 13. Try this at your desk…. Which diseases are correlated to the gene, TCF7L2? Gene/Protein Literature - Abstracts Literature – Full Text Inherited diseases Gene expression 13
  • 14. Try it again with Pistoia’s SESL…. Gene naming/synonyms Gene Function Literature statistics Disease co-occurrences Gene/protein interactions …all in one report from one search HOW? A standard vocabulary, data model, query language, report structure, etc. 14
  • 15. SESL Pilot project description • Deliverables: – Publication of standards and recommendations for brokering service implementation – Public demonstrator service for a single disease area – Dialogue and assessment of potential business impact with key content suppliers • Scope: – Development of an assertion database in combination with a user interface and associated web services for one disease/indication/phenotype of broad interest: Type II Diabetes – Assertional content derived from 3 structured data sources and limited Journal content (co-occurrence and statistical derivation from full text) – Assertional evidence for filtering and drill down to primary data. – Limited vocabulary development for area of focus: Type II Diabetes • Participants and Cost: – AZ, Pfizer, GSK, Roche, Unilever, EMBL-EBI, NPG, OUP, Elsevier & RSC – Single contract between Pistoia Alliance & EMBL-EBI – £200K cost (=2 x FTEs) – shared by industry – 12 month project, January 2010 start 15
  • 16. The Knowledge Service Framework Multiple Consumers ‘Consumer’ Disease Dossier Knowledge Applications Firewall Service Layer Std Public Common Open Assertion & Meta Data Management Vocabularies Service Stand Transform /Translate (RDF triples) Business Broker -ards Integrator/Aggregator (Triple store) Rules Supplier Firewall Content Suppliers Db 2 Db 4 Corpus 1 Db 3 Corpus 5 16 16
  • 17. Minimal configuration to test the technical feasibility of a Knowledge Broker Service Interface User Interface Layer Service Layer Std Public Service Layer Std Public Condition: Brokering service Vocabularies Vocabularies Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt Identical structure. Transform / Translate Query Transform / Translate Query Different content which can overlap Triple store 1 templates Triple store 2 templates Layer Broker #1 Broker #2 Primary source Layer RSC UK-Pubmed NPG OUP corpus Central corpus corpus EBI Uniprot corpus EBI Array EBI Uniprot database Express database Elsevier database NCBI OMIM corpus database 17
  • 18. Simple Graphical User Interface to the SESL public demonstrator 1. Single point of query through a simple GUI 2. Aggregated Results on a single web page Full text detail A. Gene query results summary Title: Authors: 1) Co-occurrence Documents Citation 2) Uniprot names and annotation Co-occurrence of 3) OMIM disease names gene and disease 4) Array express disease and/or mentions in text pancreas expression extracts 5) Uniprot GO terms 6) Uniprot Binary interactions A. Gene Query Show: and/or The results include links out to the primary sources B. Disease Query Full text detail B. Disease query results summary Title: Authors: 1) Co-occurrence Documents Citation 2) OMIM disease names Co-occurrence of 3) Array express disease expression gene and disease Filtered by: 1) Everything mentions in text extracts 2) Consensus 3) Co-occurrence 4) OMIM 5) Array Express SESL public demonstrator: http://www.pistoia-sesl.org 18
  • 19. Type 2 diabetes genes in SESL demonstrator Human protein names Human Source: SESL: Google Pubmed: SESL: gene Source: SESL: Source: SESL: Source: SESL: GO Source: SESL: gene UniProt UniProt Scholar: type 2 and type 2 OMIM OMIM Array Array Uniprot terms Uniprot Binary names diabetes diabetes type 2 diabetes diabetes diabetes diabetes Express Express GO terms Intact interactions mention mention diabetes June co- mention mention Atlas pancreas binary 2006 to 2011 occurrence pancreas interactions June 2011 in Full Text ATP-binding cassette sub-family C ABCC8 1 1 753 37 6 6 6 5 7 7 9 0 0 member 8 Calpain-10 CAPN10 1 1 810 168 21 1 1 1 1 12 12 0 0 Glucokinase GCK 1 1 3,950 708 12 7 7 0 0 19 19 2 2 Hematopoietically-expressed HHEX 0 0 626 91 24 1 0 2 2 21 23 3 0 homeobox protein Hepatocyte nuclear factor 1-alpha HNF1A 1 1 633 340 23 3 4 2 2 12 12 6 6 Hepatocyte nuclear factor 1-beta HNF1B 1 1 408 269 20 1 1 2 2 9 8 1 0 Hepatocyte nuclear factor 4-alpha HNF4A 1 1 811 173 34 2 2 3 3 22 20 5 5 Insulin INS 2 1 166,000 37,670 5 9 0 7 0 59 59 0 0 Insulin receptor substrate 1 IRS1 1 1 7,970 616 9 1 0 2 2 24 24 3 0 Insulin receptor INSR 1 1 14,00 4,830 16 2 4 6 6 41 43 9 9 ATP-sensitive inward rectifier KCNJ11 1 1 1,260 45 35 3 1 0 0 12 12 1 0 potassium channel 11 Hepatic triacylglycerol lipase LIPC 1 0 2,090 89 1 1 1 1 1 17 17 0 0 C-Jun-amino-terminal kinase- MAPK8IP1 1 1 248 4 1 1 1 1 1 6 6 4 4 interacting protein 1 Neurogenic differentiation factor 1 NEUROD1 1 1 549 50 7 2 2 2 4 13 14 0 0 Pancreas/duodenum homeobox PDX1 1 1 2,270 154 9 2 0 1 1 9 9 0 0 protein 1 Peroxisome proliferator-activated PPARG 1 1 9,540 1,556 48 1 1 2 2 40 42 7 7 receptor gamma Protein phosphatase 1 regulatory PPP1R3A 1 1 141 23 3 1 0 1 1 2 2 0 0 subunit 3A Zinc transporter 8 SLC30A8 1 0 724 117 0 2 1 3 4 13 13 0 0 Transcription factor 7-like 2 TCF7L2 1 1 2,000 284 65 1 1 3 3 33 31 5 5 Mitochondrial brown fat uncoupling UCP1 1 0 1,760 50 3 0 0 0 0 6 6 0 0 protein 1 19
  • 20. Gene discovery in SESL demonstrator Pancreas T2D disease 1 gene expression in Array mention Express db in OMIM db 3 1 Gene count 20 10 0 3 intersections from 4 the data sources in the demonstrator T2D disease T2D disease genes in gene Full Text 1 mention in documents Uniprot db 20
  • 21. Selected content loaded as RDF triples Source Description # triples % Expression data Array Express 182,840 0.5% Experimental Factor Ontology from Array Express 49,026 0.1% Disease vocabulary from UMLS 6,906,735 18.8% Vocabulary from Disease Ontology 1,863,664 5.1% Terms from Gene Ontology 495,595 1.3% Human genes from Uniprot 12,552,239 34.1% Meta data from Full Text documents 3,485,212 9.5% Gene annotations from Full Text documents 2,373,584 6.5% Disease annotations from Full Text documents 4,983,788 13.6% GO annotations from Full Text documents 3,870,834 10.5% Totals 36,763,517 100% 21
  • 22. Signposting: Standards used in SESL Category Name Community RDF W3C SPARQL W3C Triple Store Jena, Sesame, Open Source Virtuoso leXML EBI & CALBC EBI, NaCTeM, U of Text Mining LexEBI/BioLexicon Pisa CALCBC EBI & CALBC UniProt EBI, PIR, SBI, etc Disease Ontology and UMLS OBO, NIH/NLM Blending of URIs ArrayExpress EBI existing NCBI Taxonomy NCBI standards Dublin Core W3C N3 notation W3C RDF Schema Co-occurrence of gene- EBI disease PMC doc standard NCBI Relation ontology OBO Ontology URI server W3C 22
  • 23. The Deliverables of the SESL pilot • A proof-of-concept to demonstrate feasibility and clarify requirements – http://www.pistoia-sesl.org • A functional specification for query brokering, result filtering, report generation – Expect publication by end 2011 – http://www.pistoiaalliance.com/workinggroups/sesl.html • Academia, Life Science Industry and Publishers – Attained a better understanding of each other’s needs – Demonstration of potential for a new business model – Explore follow-on via Open Innovation consortia 23
  • 24. Learning and Future Direction • Framework to maximise re-use of existing standards – Minimise use of bespoke, hard-coded implementations • Crucial features of a knowledge brokering service:- – RDF triples for a scalable, meta index to broker across primary sources (both databases and literature) – Important to define business rules for query & extraction – Recommend a registry of suitable data sources • similar to web services registry • What is next? – Example, follow-on to the SESL pilot:- – Open PHACTs consortium => www.openphacts.org – 3 year IMI pre-competitive project (started early 2011) – Data providers and Life Science industry working together 24
  • 25. Acknowledgements Industry EMBL-EBI Publishers Wendy Filsell - Unilever Dietrich Rebholz Schuhmann Claire Bird – OUP (SESL co-leader) (Technical Team Leader) Richard O’Bierne – OUP Ian Stott - Unilever Christoph Grabmueller Silvestras Kavaliauskas Colin Batchelor – RSC Nigel Wilkinson - PFE Richard Kidd – RSC Catherine Marshall - PFE Dominic Clark Roderigo Lopez David Hoole – NPG Peter Woollard - GSK Jo McEntyre – UK-PMC Alf Eaton – NGP Ashley George - GSK Janet Thornton Jabe Wilson – Elsevier Mike Westaway - AZ Bradley Allen – Elsevier Nick Lynch - AZ Ian Dix - AZ Michael Braxenthaler – Roche John Wise – Pistoia Alliance 25