SlideShare una empresa de Scribd logo
1 de 50
Promiscuous patterns and perils in PubChem  and the MLSCN Jeremy J Yang Cristian Bologa Tudor Oprea Division of Biocomputing Dept of Biochem. & Mol. Biology NM Mol. Libraries Screening Center University of New Mexico
Goals ,[object Object],[object Object],[object Object],[object Object]
Background ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],10 6 10 0
Promiscuity defined ,[object Object],[object Object],[object Object],[object Object],Bioactivity for multiple targets,  i.e. “frequent-hitter”, non-selective binder Multi-target bioactivity for involved scaffold Scaffold may be a determinant or simply an informatic device. Scaffold promiscuity [working definition]
Real vs phony promiscuity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PubChem: cathedral or bazaar? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(a) National Cathedral, Washington, DC (b) Santa Fe Flea Market, Santa Fe, NM a b ref: “The Cathedral and the Bazaar”, Eric Raymond, 1997.
PubChem and MLSCN* ,[object Object],[object Object],[object Object],[object Object],*MLSCN = Molecular Libraries Screening Center Network, to be MLPCN, Molecular Libraries Program Center Network
PubChem and MLSMR ,[object Object],[object Object],[object Object],[object Object],Plot c/o Victor Panchenco, BioFocusDPI
MLSMR and MLSCN actives ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],94,148 104,078 93,616 104,610 532 10,462 (Some MLSCN compounds – esp secondary assays – from other sources such as commercial vendors.)‏ *June 2008
Selected published pre-filtering expert knowledge* ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*generalizable domain knowledge
Selected pre-filtering  semi-public expertise ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MLSMR filtering protocols ,[object Object],[object Object],[object Object],[object Object],[object Object]
MLSMR re-filtered... ,[object Object],[object Object],[object Object],*search failed via PC GUI
MLSMR re-filtered... Example rejects:
Pre-filtering at UNM Java servlet using ChemAxon/JChem
Activity multiplicity – all assays compounds active in any assay peril? 103894 MLSCN compounds  724 MLSCN assays
Activity multiplicity – screening assays  compounds active in any  screening  assay peril 89954 MLSCN compounds  268 MLSCN assays
Top 100 hier-scaffolds*, active PubChem MLSMR Example scaffold #1 *Wilkins, J. Med Chem 2005
Top active hier-scaffolds: example #1 Scaffold: 33 rd  most common 510 active compounds  34 assays c1c2c([nH]c(=O)cn2)ncn1  Top 12 compounds Top 12 of 510 compounds CID, #assays active
Top active hier-scaffolds: example #1 #compounds vs #assays in which they are active
Top active hier-scaffolds: example #1 All from NCGC
Other promiscuous scaffolds COMPOUNDS:total/tested/active  ASSAYS:tested/active  SAMPLES:tested/active
Other promiscuous scaffolds COMPOUNDS:total/tested/active  ASSAYS:tested/active  SAMPLES:tested/active
Other promiscuous scaffolds COMPOUNDS:total/tested/active  ASSAYS:tested/active  SAMPLES:tested/active
Activity mining with PubChem GUI Lots of great functionality, but not everything...
Activity mining with command line Automation is good...
Example scaffold #2 62 nd  most common 523 active compounds 208 assays c1ccc(cc1)NC(=O)c2ccco2 Top active hier-scaffolds: example #2 Top 12 of 523 compounds CID, #assays active
Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
Top active hier-scaffolds: example #3 Example scaffold #4 1307 th  most common 27 active compounds 140 assays c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 toxoflavin
Top active hier-scaffolds: example #3 #compounds vs #assays in which they are active
Digression: PubChem bug... substructure search for CID 66541 143 hits substructure search for C2(C1=NC=NNC1=NC(N2)=O)=O 143 hits substructure search for c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 0 hits Ergo: aromatic structural queries not allowed?
Top active hier-scaffolds: example #3
More example scaffolds and histograms of #compounds vs. #assays in which compounds are active
Scaffold promiscuity vs. SEA* ,[object Object],SEA (Similarity Ensemble Approach): For given query molecule, are there bioactive similars and for what targets? *Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotech 25 (2), 197-206 (2007). http://shoichetlab.compbio.ucsf.edu/~keiser/sea/
Possibilities ,[object Object],[object Object],[object Object],[object Object],[object Object]
Possibilities ,[object Object],*"Is There a General Model for Bioactivity?", T.I. Oprea, O. Ursu, C.G. Bologa, and L.A. Sklar, The 8th International Conference on Chemical Structures, June, 2008, Noordwijkerhout, The Netherlands (http://www.int-conf-chem-structures.org/pdf/B-6.pdf).
Data mining methodology notes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],PUG = NCBI PubChem Power User Gateway (http)‏ Entrez = NCBI Entrez eUtils API OE = OpenEye OEChem All code Perl or Python *June 2008
Now for some general comments... 10 6 10 0
Probability – in HTS game etc. In other words: Quantity vs. Quality E not striking out  = 1 – E striking out E success  = 1 – (1 - E hit ) N where E hit  = probability of hit per try N = number of tries
Probability, more hard lessons *&quot;Method and Apparatus for Designing Molecules with Desired Properties by  Evolving Successive Populations,&quot; David Weininger, U.S. patent  US5434796, 1995.  De novo  molecular design Case study: Grok and Grope*, 1992, Weininger, Blaney, Dixon GA -> virtual library, docking fitness, lots of cpu cycles But you need to recognize a good hit, including all aspects of fitness (ADMET, synthesis, etc.).  <- approximation from memory ,[object Object],[object Object],[object Object]
Probability, more quantity vs quality “ Quantitative high-throughput screening: A titration-based approach that efficiently identifies biological activities in large chemical libraries”, Inglese et al., PNAS, 103, 2006, 11473-11478.
Probability and prejudice Lucky CEOs, coaches, and fund managers, Kahneman's 2002 Nobel prize for economics-psychology, good stories vs. Occam's razor, confirmation bias, pathological pattern-recognizers are we.
Statistics and signficance “ The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”, Stephen R. Johnson, J. Chem. Inf. Model., 48 (1), 25 -26, 2008.
Conclusions ,[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object]
Conclusions (from Chris Lipinski*)‏ ,[object Object],[object Object],[object Object],[object Object],*Chris Lipinski, Nanosyn Open House talk, Feb 16, 2008.
Acknowledgements, thanks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],OpenEye Software c/o: ,[object Object]
Scaffolds and chemotypes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*Ertl, et al., Quest for the Rings
HierScaffolds:  - scaffolds are one or several rings connected by linkers - compounds can be related by any of their scaffolds

Más contenido relacionado

La actualidad más candente

Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architectureGenomeInABottle
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justinGenomeInABottle
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGenomeInABottle
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle
 
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...Deepak Bandyopadhyay
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsGenomeInABottle
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justinGenomeInABottle
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_introGenomeInABottle
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminarGenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsProf. Wim Van Criekinge
 

La actualidad más candente (20)

Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformaticsBioinformatica 15-12-2011-t9-t10-bio cheminformatics
Bioinformatica 15-12-2011-t9-t10-bio cheminformatics
 

Similar a Promiscuous patterns and perils in PubChem and the MLSCN

Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016Andrew Pope
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug DiscoverySciBite Limited
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryLee Larcombe
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Sean Ekins
 
Sandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safetySandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safetysm78354
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformLaura Berry
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionPaul Groth
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology Sean Ekins
 

Similar a Promiscuous patterns and perils in PubChem and the MLSCN (20)

Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s e...
 
Sandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safetySandeep Modi Phildelphia nov10 Drug safety
Sandeep Modi Phildelphia nov10 Drug safety
 
Cadd
CaddCadd
Cadd
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 

Más de Jeremy Yang

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesJeremy Yang
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APIJeremy Yang
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerJeremy Yang
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterJeremy Yang
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Jeremy Yang
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discoveryJeremy Yang
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingJeremy Yang
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of ComputingJeremy Yang
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsJeremy Yang
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds posterJeremy Yang
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryJeremy Yang
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDJeremy Yang
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesJeremy Yang
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...Jeremy Yang
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsJeremy Yang
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
 

Más de Jeremy Yang (20)

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles Explorer
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of Computing
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of Computing
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubts
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds poster
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discovery
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARD
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applications
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 

Último

Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipKarl Donert
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
physiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptxphysiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptxAneriPatwari
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 

Último (20)

Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenship
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
physiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptxphysiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 

Promiscuous patterns and perils in PubChem and the MLSCN

  • 1. Promiscuous patterns and perils in PubChem and the MLSCN Jeremy J Yang Cristian Bologa Tudor Oprea Division of Biocomputing Dept of Biochem. & Mol. Biology NM Mol. Libraries Screening Center University of New Mexico
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 15. Pre-filtering at UNM Java servlet using ChemAxon/JChem
  • 16. Activity multiplicity – all assays compounds active in any assay peril? 103894 MLSCN compounds 724 MLSCN assays
  • 17. Activity multiplicity – screening assays compounds active in any screening assay peril 89954 MLSCN compounds 268 MLSCN assays
  • 18. Top 100 hier-scaffolds*, active PubChem MLSMR Example scaffold #1 *Wilkins, J. Med Chem 2005
  • 19. Top active hier-scaffolds: example #1 Scaffold: 33 rd most common 510 active compounds 34 assays c1c2c([nH]c(=O)cn2)ncn1 Top 12 compounds Top 12 of 510 compounds CID, #assays active
  • 20. Top active hier-scaffolds: example #1 #compounds vs #assays in which they are active
  • 21. Top active hier-scaffolds: example #1 All from NCGC
  • 22. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • 23. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • 24. Other promiscuous scaffolds COMPOUNDS:total/tested/active ASSAYS:tested/active SAMPLES:tested/active
  • 25. Activity mining with PubChem GUI Lots of great functionality, but not everything...
  • 26. Activity mining with command line Automation is good...
  • 27. Example scaffold #2 62 nd most common 523 active compounds 208 assays c1ccc(cc1)NC(=O)c2ccco2 Top active hier-scaffolds: example #2 Top 12 of 523 compounds CID, #assays active
  • 28. Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
  • 29. Top active hier-scaffolds: example #2 #compounds vs #assays in which they are active
  • 30. Top active hier-scaffolds: example #3 Example scaffold #4 1307 th most common 27 active compounds 140 assays c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 toxoflavin
  • 31. Top active hier-scaffolds: example #3 #compounds vs #assays in which they are active
  • 32. Digression: PubChem bug... substructure search for CID 66541 143 hits substructure search for C2(C1=NC=NNC1=NC(N2)=O)=O 143 hits substructure search for c1nc-2c(=O)[nH]c(=O)nc2[nH]n1 0 hits Ergo: aromatic structural queries not allowed?
  • 34. More example scaffolds and histograms of #compounds vs. #assays in which compounds are active
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Now for some general comments... 10 6 10 0
  • 40. Probability – in HTS game etc. In other words: Quantity vs. Quality E not striking out = 1 – E striking out E success = 1 – (1 - E hit ) N where E hit = probability of hit per try N = number of tries
  • 41.
  • 42. Probability, more quantity vs quality “ Quantitative high-throughput screening: A titration-based approach that efficiently identifies biological activities in large chemical libraries”, Inglese et al., PNAS, 103, 2006, 11473-11478.
  • 43. Probability and prejudice Lucky CEOs, coaches, and fund managers, Kahneman's 2002 Nobel prize for economics-psychology, good stories vs. Occam's razor, confirmation bias, pathological pattern-recognizers are we.
  • 44. Statistics and signficance “ The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”, Stephen R. Johnson, J. Chem. Inf. Model., 48 (1), 25 -26, 2008.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50. HierScaffolds: - scaffolds are one or several rings connected by linkers - compounds can be related by any of their scaffolds