SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Towards a Gold Standard: Improving The
  Quality of Public Domain Chemistry
               Databases

           Antony J. Williams1, Sean Ekins 2


  1Royal Society of Chemistry, Wake Forest, NC 27587
 2Collaborations in Chemistry, Fuquay Varina, NC 27526.
The future: crowdsourced drug discovery




Williams et al., Drug Discovery World, Winter 2009
Chemistry structures are proliferating
                 on the web
   Safety data
   Toxicity data
   Blogs and Wikis
   Property databases            Users take them at face value
   Experimental results
   Scientific publications           They SHOULD NOT!!!
   Compound aggregators
   Open Notebook Science
   Metabolic pathway databases
   Encyclopedic articles (Wikipedia)

    Immense quantities of scientific information are contained in the
    thousands of databases

    Progress can however be inhibited by errors in these databases,
    downstream effects when the data is reused.
                                                  http://bit.ly/zWGaps
What is the Structure of Vitamin K1?
What Mechanisms Do we Have to Alert the Community ?
   Email database owner and hope for a response
   Blog it
      Tony has been blogging about database quality for years and nobody
       was listening – other than the people at PubChem
      For some databases, when he blogged they listened and would edit!

   Tweet it

   Dec 2010 - We felt something had to be said definitively about structure
    quality
   Publish it – wrote to Science, Nature and then PLoS Computational Biology

    http://bit.ly/qtJF2f

                  Perhaps the phone?
April 27 2011- Then came the :
       The NPC Browser




               Science Translational Medicine 2011
But wait, hold on – did anyone peer review the
                          database??
Database released and within days ..
A quick analysis of structure quality revealed..
100’s of errors found in structures




                                                   Williams and Ekins,
                                                   DDT, 16: 747-750 (2011)
NPC Browser
http://tripod.nih.gov/npc/
Neomycin in NPC Browser
http://tripod.nih.gov/npc/
Neomycin In ChemSpider
How many contribute to
             clean-up?
   Less than a dozen contributors to data

   The majority are project members



   The crowd is                small…
   This is the same for all cheminformatics crowd-
    based efforts
What Mechanisms Do we Have to Alert the Community –
                  Publishing is too slow


   Tony Blogged April 28th 1 day after
    release http://bit.ly/jn8wLC

   I Blogged April 29th http://bit.ly/lXHInG
   suggesting the need for a gold standard
    database

   After more extensive analysis we sent a
    manuscript to Science Translational
    Medicine - Rejected

   Drug Discovery Today..accepted…8
    Months after we pointed out the issue
    even before NPC Browser release..
                                                Williams and Ekins,
                                                DDT, 16: 747-750 (2011)
Responses from Community and NCGC

    Comments on initial blog
    NCGC added a disclaimer which I blogged about May 23rd
     http://bit.ly/m4Tx2b

                                                 Sept 8th 2011
                                            Email from Tudor Oprea
                                              (cc’ed to 60 others)
                                           He has also been pointing
                                            out database errors for
                                                     years..

                                             Followed by one from
                                             Chris Austin offering to
                                                    meet us

    Several individuals thanked us for the alert
More Extensive Analysis and solutions


     More analysis of NPC browser errors

     “analysis of the NPC browser ‘HTS amenable compounds’ subset of
      data for 7600 compounds identified fundamental errors in
      stereochemistry, valency issues and charge imbalances in a few
      minutes work using a rudimentary software tool”

     Analysis of other chemistry databases and errors

     Other types of databases and errors

     Offered solutions

Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving
the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
Data Errors in the NPC Browser: Analysis of Steroids




         Substructure    # of    # of           No            Incomplete        Complete but

                         Hits   Correct   stereochemistry   Stereochemistry       incorrect

                                 Hits                                         stereochemistry




       Gonane             34      5              8                21                 0


       Gon-4-ene          55      12             3                33                 7


       Gon-1,4-diene      60      17            10                23                 10




Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving
the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
Why this matters to us and
   YOU the CROWD ?
What You Might Not Know About
    Chemistry Databases On The Internet
   Data-sharing between open databases is cyclic
   This can proliferate errors in the “Linked Data”
Public Domain Databases
   Our databases are a mess…

   Non-curated databases are proliferating errors

   We source and deposit data between databases

   Original sources of errors hard to determine

   Curation is time-consuming and challenging
Molecule Data Quality Impacts
   in silico drug discovery
     vast ligand and protein–protein interaction databases
     develop computational models

     global mapping of pharmacological space

     drug-target networks of approved drugs

     prediction of off-target effects
Different types of
            databases and errors
   Bayer paper on target validation 2/3 of papers did not live up to claims

   MDL Drug Data Report (MDDR), errors

   Errors in clinical research databases vary from 2.3% to 26.9%

   Multicenter analysis by MS-based proteomics identified generic problems in
    databases when characterizing proteins -search engines could not distinguish
    different identifiers many algorithms calculated molecular weight incorrectly

   One database had between 2.1% and 13.6% of annotated Pfam hits unjustified



   ligand–protein X-ray structure - these can also have errors with far reaching
    consequences
Solutions
   Structure Validation and Standardization
   Curation
   Annotation
   Structure filters
        Incorrect valency, atom labels, aromatic bonds, stereochemistry, salts,
         duplication
   Structure standardization guidelines
        Provided by the FDA (Substance Registration System UniqueIngredient
         Identifier (UNII):
         http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSyste
         m-UniqueIngredientIdentifierUNII/default.htm)


   Need a record of molecule provenance
   Can we track databases and quality - - www.scidbs.com
RSC Introduces “Validation Service”
Scidbs.com
        Default Body
Scidbs.com

                            DB logo

                            Type of DB


                            Contact
                            Owner
             Default Body   Website


                            License
                            Curation etc
Data should be:
   Free from structure errors
   Free from data errors
   Free from experimental errors

   Are we asking too much? Is it even possible??

Yet when we alert others:
   When we raise our hands we are ignored
   Our scientific community needs to wake up
Today
   NPC browser has fewer errors..so do ALL databases!
   More people aware of molecule quality online. Trust is
    earned not just granted!
   The future database user is more informed


                 Tomorrow
   Peer reviewers test the databases that are in manuscripts
   NIH checks databases before release!
   COLLABORATION between government DBs. PLEASE!!!
   We need minimal compound database standards
    (MCDS)
Acknowledgement

We thank the paper reviewers
and blog commenters
for their constructive comments

Chris Lipinski

This work was unfunded
(but was the right thing to do!)


www.scidbs.com

Más contenido relacionado

Destacado

Resume milind patil
Resume milind patilResume milind patil
Resume milind patilMilind Patil
 
Slides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinalSlides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinalSean Ekins
 
Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013Experiencia Trading
 
Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big DataBooz Allen Hamilton
 
Secrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentationSecrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentationMartha Lord
 
How to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o AudioHow to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o Audiosheppar1
 
LinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation AidLinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation AidRaghunath Ramaswamy
 
Локальная_система_позиционирования
Локальная_система_позиционированияЛокальная_система_позиционирования
Локальная_система_позиционированияOleg Dubinin
 
orchid island 蘭嶼
orchid island 蘭嶼orchid island 蘭嶼
orchid island 蘭嶼kkjjkevin03
 
Giving feedback & Scrum
Giving feedback & ScrumGiving feedback & Scrum
Giving feedback & ScrumJohan Hoberg
 
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...Bjoern Knafla
 

Destacado (18)

Resume milind patil
Resume milind patilResume milind patil
Resume milind patil
 
Slides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinalSlides for burroughs wellcome foundation ajw100611 sefinal
Slides for burroughs wellcome foundation ajw100611 sefinal
 
6th lesson
6th lesson6th lesson
6th lesson
 
Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013Grafico diario del dax perfomance index para el 13 08-2013
Grafico diario del dax perfomance index para el 13 08-2013
 
Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big Data
 
Secrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentationSecrets of e marketing success 2016 presentation
Secrets of e marketing success 2016 presentation
 
How to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o AudioHow to Deal with an Overbearing Mother w/o Audio
How to Deal with an Overbearing Mother w/o Audio
 
BGP Loop Prevention
BGP Loop Prevention BGP Loop Prevention
BGP Loop Prevention
 
LinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation AidLinkedIn for education: An Implementation Aid
LinkedIn for education: An Implementation Aid
 
Локальная_система_позиционирования
Локальная_система_позиционированияЛокальная_система_позиционирования
Локальная_система_позиционирования
 
orchid island 蘭嶼
orchid island 蘭嶼orchid island 蘭嶼
orchid island 蘭嶼
 
Giving feedback & Scrum
Giving feedback & ScrumGiving feedback & Scrum
Giving feedback & Scrum
 
MEC / CES - January 6, 2015
MEC / CES - January 6, 2015MEC / CES - January 6, 2015
MEC / CES - January 6, 2015
 
Presentation1
Presentation1Presentation1
Presentation1
 
La capa de ozono
La capa de ozonoLa capa de ozono
La capa de ozono
 
Big ideas 2015
Big ideas 2015Big ideas 2015
Big ideas 2015
 
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP ...
 
Evaluation question 1res
Evaluation question 1resEvaluation question 1res
Evaluation question 1res
 

Similar a Improving Public Domain Chemistry Database Quality

Dispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical AnalysesDispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical Analyses Sean Ekins
 

Similar a Improving Public Domain Chemistry Database Quality (20)

ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Mining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposingMining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposing
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
 
Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Online Public Compound Databases
Online Public Compound DatabasesOnline Public Compound Databases
Online Public Compound Databases
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
Connecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpiderConnecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpider
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Dispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical AnalysesDispensing Processes Impact Computational and Statistical Analyses
Dispensing Processes Impact Computational and Statistical Analyses
 
Chem spider as a chemical term resolver
Chem spider as a chemical term resolverChem spider as a chemical term resolver
Chem spider as a chemical term resolver
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 

Más de Sean Ekins

How to Win a small business grant.pptx
How to Win a small business grant.pptxHow to Win a small business grant.pptx
How to Win a small business grant.pptxSean Ekins
 
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Sean Ekins
 
A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...Sean Ekins
 
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Sean Ekins
 
Bayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseBayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseSean Ekins
 
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Sean Ekins
 
Drug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueDrug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueSean Ekins
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchSean Ekins
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation Sean Ekins
 
academic / small company collaborations for rare and neglected diseasesv2
 academic / small company collaborations for rare and neglected diseasesv2 academic / small company collaborations for rare and neglected diseasesv2
academic / small company collaborations for rare and neglected diseasesv2Sean Ekins
 
CDD models case study #3
CDD models case study #3 CDD models case study #3
CDD models case study #3 Sean Ekins
 
CDD models case study #2
CDD models case study #2 CDD models case study #2
CDD models case study #2 Sean Ekins
 
CDD Models case study #1
CDD Models case study #1 CDD Models case study #1
CDD Models case study #1 Sean Ekins
 
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Sean Ekins
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...Sean Ekins
 
The future of computational chemistry b ig
The future of computational chemistry b igThe future of computational chemistry b ig
The future of computational chemistry b igSean Ekins
 
#ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - #ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - Sean Ekins
 
Slas talk 2016
Slas talk 2016Slas talk 2016
Slas talk 2016Sean Ekins
 
Pros and cons of social networking for scientists
Pros and cons of social networking for scientistsPros and cons of social networking for scientists
Pros and cons of social networking for scientistsSean Ekins
 

Más de Sean Ekins (20)

How to Win a small business grant.pptx
How to Win a small business grant.pptxHow to Win a small business grant.pptx
How to Win a small business grant.pptx
 
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
Evaluating Multiple Machine Learning Models for Biodegradation and Aquatic To...
 
A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...A presentation at the Global Genes rare drug development symposium on governm...
A presentation at the Global Genes rare drug development symposium on governm...
 
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...Leveraging Science Communication and Social Media to Build Your Brand and Ele...
Leveraging Science Communication and Social Media to Build Your Brand and Ele...
 
Bayesian Models for Chagas Disease
Bayesian Models for Chagas DiseaseBayesian Models for Chagas Disease
Bayesian Models for Chagas Disease
 
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...
 
Drug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issueDrug Discovery Today March 2017 special issue
Drug Discovery Today March 2017 special issue
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
academic / small company collaborations for rare and neglected diseasesv2
 academic / small company collaborations for rare and neglected diseasesv2 academic / small company collaborations for rare and neglected diseasesv2
academic / small company collaborations for rare and neglected diseasesv2
 
CDD models case study #3
CDD models case study #3 CDD models case study #3
CDD models case study #3
 
CDD models case study #2
CDD models case study #2 CDD models case study #2
CDD models case study #2
 
CDD Models case study #1
CDD Models case study #1 CDD Models case study #1
CDD Models case study #1
 
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
Using Machine Learning Models Based on Phenotypic Data to Discover New Molecu...
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
 
The future of computational chemistry b ig
The future of computational chemistry b igThe future of computational chemistry b ig
The future of computational chemistry b ig
 
#ZikaOpen: Homology Models -
#ZikaOpen: Homology Models - #ZikaOpen: Homology Models -
#ZikaOpen: Homology Models -
 
Slas talk 2016
Slas talk 2016Slas talk 2016
Slas talk 2016
 
Pros and cons of social networking for scientists
Pros and cons of social networking for scientistsPros and cons of social networking for scientists
Pros and cons of social networking for scientists
 

Último

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...jageshsingh5554
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...narwatsonia7
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Dipal Arora
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Dipal Arora
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Call Girls in Nagpur High Profile
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiAlinaDevecerski
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293
 

Último (20)

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 

Improving Public Domain Chemistry Database Quality

  • 1. Towards a Gold Standard: Improving The Quality of Public Domain Chemistry Databases Antony J. Williams1, Sean Ekins 2 1Royal Society of Chemistry, Wake Forest, NC 27587 2Collaborations in Chemistry, Fuquay Varina, NC 27526.
  • 2. The future: crowdsourced drug discovery Williams et al., Drug Discovery World, Winter 2009
  • 3. Chemistry structures are proliferating on the web  Safety data  Toxicity data  Blogs and Wikis  Property databases Users take them at face value  Experimental results  Scientific publications They SHOULD NOT!!!  Compound aggregators  Open Notebook Science  Metabolic pathway databases  Encyclopedic articles (Wikipedia) Immense quantities of scientific information are contained in the thousands of databases Progress can however be inhibited by errors in these databases, downstream effects when the data is reused. http://bit.ly/zWGaps
  • 4. What is the Structure of Vitamin K1?
  • 5. What Mechanisms Do we Have to Alert the Community ?  Email database owner and hope for a response  Blog it  Tony has been blogging about database quality for years and nobody was listening – other than the people at PubChem  For some databases, when he blogged they listened and would edit!  Tweet it  Dec 2010 - We felt something had to be said definitively about structure quality  Publish it – wrote to Science, Nature and then PLoS Computational Biology http://bit.ly/qtJF2f Perhaps the phone?
  • 6. April 27 2011- Then came the : The NPC Browser Science Translational Medicine 2011
  • 7. But wait, hold on – did anyone peer review the database?? Database released and within days .. A quick analysis of structure quality revealed.. 100’s of errors found in structures Williams and Ekins, DDT, 16: 747-750 (2011)
  • 9. Neomycin in NPC Browser http://tripod.nih.gov/npc/
  • 11. How many contribute to clean-up?  Less than a dozen contributors to data  The majority are project members  The crowd is small…  This is the same for all cheminformatics crowd- based efforts
  • 12. What Mechanisms Do we Have to Alert the Community – Publishing is too slow  Tony Blogged April 28th 1 day after release http://bit.ly/jn8wLC  I Blogged April 29th http://bit.ly/lXHInG  suggesting the need for a gold standard database  After more extensive analysis we sent a manuscript to Science Translational Medicine - Rejected  Drug Discovery Today..accepted…8 Months after we pointed out the issue even before NPC Browser release.. Williams and Ekins, DDT, 16: 747-750 (2011)
  • 13. Responses from Community and NCGC  Comments on initial blog  NCGC added a disclaimer which I blogged about May 23rd http://bit.ly/m4Tx2b Sept 8th 2011 Email from Tudor Oprea (cc’ed to 60 others) He has also been pointing out database errors for years.. Followed by one from Chris Austin offering to meet us Several individuals thanked us for the alert
  • 14. More Extensive Analysis and solutions  More analysis of NPC browser errors  “analysis of the NPC browser ‘HTS amenable compounds’ subset of data for 7600 compounds identified fundamental errors in stereochemistry, valency issues and charge imbalances in a few minutes work using a rudimentary software tool”  Analysis of other chemistry databases and errors  Other types of databases and errors  Offered solutions Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
  • 15. Data Errors in the NPC Browser: Analysis of Steroids Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistry Gonane 34 5 8 21 0 Gon-4-ene 55 12 3 33 7 Gon-1,4-diene 60 17 10 23 10 Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving the Situation Antony J. Williams, Sean Ekins and Valery Tkachenko, Drug Discovery Today, In Press 2012
  • 16. Why this matters to us and YOU the CROWD ?
  • 17. What You Might Not Know About Chemistry Databases On The Internet  Data-sharing between open databases is cyclic  This can proliferate errors in the “Linked Data”
  • 18. Public Domain Databases  Our databases are a mess…  Non-curated databases are proliferating errors  We source and deposit data between databases  Original sources of errors hard to determine  Curation is time-consuming and challenging
  • 19. Molecule Data Quality Impacts  in silico drug discovery  vast ligand and protein–protein interaction databases  develop computational models  global mapping of pharmacological space  drug-target networks of approved drugs  prediction of off-target effects
  • 20. Different types of databases and errors  Bayer paper on target validation 2/3 of papers did not live up to claims  MDL Drug Data Report (MDDR), errors  Errors in clinical research databases vary from 2.3% to 26.9%  Multicenter analysis by MS-based proteomics identified generic problems in databases when characterizing proteins -search engines could not distinguish different identifiers many algorithms calculated molecular weight incorrectly  One database had between 2.1% and 13.6% of annotated Pfam hits unjustified  ligand–protein X-ray structure - these can also have errors with far reaching consequences
  • 21. Solutions  Structure Validation and Standardization  Curation  Annotation  Structure filters  Incorrect valency, atom labels, aromatic bonds, stereochemistry, salts, duplication  Structure standardization guidelines  Provided by the FDA (Substance Registration System UniqueIngredient Identifier (UNII): http://www.fda.gov/ForIndustry/DataStandards/SubstanceRegistrationSyste m-UniqueIngredientIdentifierUNII/default.htm)  Need a record of molecule provenance  Can we track databases and quality - - www.scidbs.com
  • 23. Scidbs.com Default Body
  • 24. Scidbs.com DB logo Type of DB Contact Owner Default Body Website License Curation etc
  • 25. Data should be:  Free from structure errors  Free from data errors  Free from experimental errors  Are we asking too much? Is it even possible?? Yet when we alert others:  When we raise our hands we are ignored  Our scientific community needs to wake up
  • 26. Today  NPC browser has fewer errors..so do ALL databases!  More people aware of molecule quality online. Trust is earned not just granted!  The future database user is more informed Tomorrow  Peer reviewers test the databases that are in manuscripts  NIH checks databases before release!  COLLABORATION between government DBs. PLEASE!!!  We need minimal compound database standards (MCDS)
  • 27. Acknowledgement We thank the paper reviewers and blog commenters for their constructive comments Chris Lipinski This work was unfunded (but was the right thing to do!) www.scidbs.com