SlideShare una empresa de Scribd logo
1 de 22
Using Chemicalize.org with Other Open
Resources to Extract SAR from Patents and
     Explore Intersects in PubChem




                     Christopher Southan

             ChrisDS Consulting, Göteborg, Sweden,

    Prepared for the ChemAxon UGM, May 2012, version 2nd May




                                                               [1]
Key Relationships in Patents and Papers
                                                                            MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGGA
                                                                            PLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGY
                                                                            YVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQ
                                                                            RQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGP
                                                                            NVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDD
                                                                            SLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASV
                                                                            GGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQ
                                                                            DLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKA
                                                                            ASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLM
                                                                            GEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSS
                                                                            TGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRT
                                                                            AAVEGPFVTLDMEDCGYNIPQTDESTLMTIAYVMAAICAL
                                                                            FMLPLCLMVCQWRCLRCLRQQHDDFADDISLLK




     Document            Assay            Result             Compound             Target


                                                                   Discerning and
                                                                   mapping these
   2011 http://www.ncbi.nlm.nih.gov/pubmed/21569515                relatioshionships from
                                                                   documents is crucial
                                                                   and demanding

                                                                   Chemicalize.org is a
    2010 http://www.citeulike.org/user/cdsouthan/article/8637426   significant advance in
                                                                   open chemistry
                                                                   extraction

2012 http://www.slideshare.net/cdsouthan/southan-bio-it2012patents
                                                                                                             [2]
Practical Utilities

• Name-to-struc (n>s) for selected or batch conversions from
  patents, papers, abstracts, web pages and other sources
• Intersect different content at identity or similarity level
• Molecular properties and bulk download
• Extracted structures archived, searchable and sharable
• Similarity display of analogue series from a document
• Bulk upload to PubChem for intersects and triage
• Result display in JChem for Excel
• Can iterate with OPSIN for IUPAC fixes
                                                                [3]
Chemicalize.org Exploitation Challenges

• Specific retrieval of patent or other source (e.g. target recall)
• Working different sources (e.g. CiteXplore/espace/Scibite for retrieval,
  Google for cross-checks, WIPO for images and tables,
  Freepatentsonline for deeper queries)
• Eyeballing original documents for relevant sections
• Locating exemplified drug-relevant/lead-like structures with data links
• For many patents examples >> activity data links > potent structures
• Selecting best sources/family members for optimal IUPAC extraction
  quality (e.g. US pats and FPO)
• Filtering novel structures from common chemistry
• Need to be PubChem cogniscant for effective triage
• For a variety of reasons some documents have low extraction rates
• Tricks and work-rounds enhance exploitation


                                                                             [4]
Target Recall: CiteExplore




•   Title only ”DPPIV”                           Medline = 37 Patents = 31
•   Title + abstract ”DPPIV”                    Medline = 402 Patents = 144
•   Title + abstract ”dipeptidyl peptidase” Medline = 4,838 Patents = 1,520
•   Title + abstract ” inhibitor”       Medline = 772,053 Patents = 124,516
•   Title + abstract ” diabetes”        Medline = 431,299 Patents = 36,792
•   Title + abstract ”DPPIV OR dipeptidyl peptidase AND inhibitor AND
    diabetes” Medline = 1,105 Patents = 604

CiteXplore is restricted to EBI patent abstracts so you can get higher recall at
full-text sources such as SureChemOpen, EPO/espace, WIPO and FPO
(but not search Medline in parallel)

                                                                                   [5]
Target Alerts: SciBite




                      US2012040982
                          DPPIV
                    Boeringer Ingelheim
                         Feb 2012
                                          [6]
Slicing and Dicing US2012040982 (I)




•   Chemicalize converted 1,390 structures from the FreePatentsOnline (FPO) URL
•   From the 497 examples 486 converted
•   Need to scan the document and iterate with scroll bar to spot lead-like structures
                                                                                         [7]
Slicing and Dicing US2012040982 (II)




• OPSIN picks up some of what chemicalize misses (e.g. 389 above) but not all
• OPSIN error reports may help fix a series for Chemicalize (e.g 1 vs. L)
• Practically more important if that example has potent activity

                                                                            [8]
Slicing and Dicing US2012040982 (III)




•   Similarity display clearly picks out the lead-like analog series (top)
•   Select via FPO text > example list only, > Word > PDF > chemicalize
    upload > SDF download 486 structures (bottom)
•   However, from the partial descriptions these may include prophetics
•   Also download 28 claimed examples via PDF                                [9]
Slicing and Dicing US2012040982 (IV)




•   Can locate an SAR table with 11 point IC50s
•   But.... only 9 examples below 100 nM, example 25 is 56 nM
•   The designation of series 1 and 2 obfuscates their example identity
                                                                          [10]
PubChem Triage of Chemicalize Output (I)




•   Example 25 SMILES > neither an exact match nor tautomer – thus novel
•   Repeat search at 95% Tanimoto > 289 neighbors > cluster
•   Closest PubChem analog > ChemSpider > SureChem > Novo Nordisk DPPIV
    patent from 2005                                                     [11]
PubChem Triage of Chemicalize Output (II)




•   Total extraction from US2012040982 > 1,390 SDFs > 1387 uploaded > 7 “failed”
•   493 exact matches (= preexisting PubChem CIDs)
•   486 example-only SDFs > upload > 21 exact-match CIDs
•   34 claims-only give 9 exact-match CIDs, primary sources were:
•   5 from ChEMBL from a Boeringer Ingleheim 2007 Publication
•   7 from Thomson Pharma
•   2 from ChemSpider with SureChem links to Boeringer Ingleheim patents
•   Thus 461 examples chemicalized from US2012040982 are “novel” structures
•   However, cannot check enatiomeric or tautomeric inexact matches from
    PubChem interface (only for existing CIDs)
                                                                              [12]
PubChem Triage of Chemicalize Output (III)




•   Chemicalize examples-plus-claims US201204098 = 29 CIDs (search 36 above)
•   Thomson Pharma/Discovery gate intersect is ~ Derwent WPI (search 31)
•   This matched 20 from the 29 (search 36), presumably DWPI extractions
•   ChEMBL (7) matched 6 from 29 (i.e. extracted from papers)
•   SLING matched 8 from 29 (i.e. extracted from EPO patents)
•   It was thus possible to intersect the chemicalize extractions from this patent with
    four independent primary sources in PubChem from patents and publications

                                                                                          [13]
Patent ”Walking” from Chemicalize
                   similarity results (I)




•   The similarity results from one example gave 1734 matches out to Tanimoto 0.5,
    extending ”beyond” the example space of US2012040982.
•   Scrolling these shows at Tanimoto 0.6, with shared substructures in blue,
    connect to a different older patent US7772226, also for DPPIV, from Eisai

                                                                                 [14]
Patent ”Walking” from Chemicalize
                    similarity results (II)




•   US7772226 from FPO converted 1127 (i.e. more than the 992 from PatBase)
•   680 matched PubChem CIDs
•   Example 228 CC#CCN1C(=NC2=C1C(=O)NC(OC1=CC=C(C=C1)C(=O)OC(=O)C(F)(F)F)=N2)N1CCNCC1
    had a 12 nM IC50 for DPPIV
•   Can even ”walk” to a third DPPIV patent WO2007071738 from Novartis
                                                                                         [15]
Extracting from CiteXplore ChEMBL

                         • CiteXplore lists ChEMBL
                           IUPACs and IDs

                         • Can chemicalize all
                           ChEMBL structures from
                           from one paper

                         • Difficult to ID these in
                           ChEMBL

                         • Upload 8 structures to
                           PubChem

                         • 7 match ChEMBL IDs

                         • Only one matches the 29
                           from US2012040982

                         • Thus paper probably from
                           mutiple patents
                                                      [16]
Mining PubMed Central Full-text Papers (I)




•    Only a few examples converted direct
•    So > wordpad > direct chemicalize (iterate) > web page (Google sites)
•    Download > Upload to JChem for Excel
•    Add in IC50 values from paper

                                                                             [17]
Mining PubMed Central Full-text Papers (II)




                         •   Add the SAR data from
                             the paper into the
                             structure table

                         •   These had no exact
                             matches in PubChem




                                                     [18]
Chemicalizing the DrugBank Entry for DPPIV


                             41 conversions of
                             inhbitors, many are PDB
                             ligands




                                                       [19]
Can Even Extract Catalogues that have no
          SMILES or InChIs....




                                   Tocris DPPIV
                                   inhibitor >
                                   chemicalize >

                                   PubChem > 6
                                   analogs




                                               [20]
Conclusions
•   Chemicalize.org is powerful, flexible and free, as in beer....
•   Significantly enables small-scale roll-your-own patent mining
•   Ditto for journal article/abstract mining (e.g. for papers not captured in ChEMBL)
•   You still need perspicacity to discern SAR details
•   Complementary to commercial patent databases populated by manual extraction
    (e.g. you can extract more structures)
•   Commercial automated patent extraction databases typically combine ChemAxon
    n>s with other algorithms (e.g. http://www.chemaxon.com/library/benchmarking-
    chemaxon%E2%80%99s-name-to-structure-batch-tool-on-patent-text/)
•   While they thus out-perform chemicalize, it is still very useful for intersecting
    journal articles or other sources against any databases
•   Significant novel content (w.r.t. public databases) is accumulating via ”default
    crowdsourcing” in the chemicalize archive which becomes an important cross-
    check source and can be ”walked” between documents
•   Combined with OPSIN and OSRA structures from most sources are extractable
•   Synergies with sources such as PubChem, PubMed Central, ChEMBL and
    SureChemOpen will advance academic drug discovery and chemical biology


                                                                                    [21]
Questions Welcome

ChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm
Mobile: +46(0)702-530710
Skype: cdsouthan
Email: cdsouthan – at - hotmail.com
Twitter:       http://twitter.com/#!/cdsouthan
Blog:          http://cdsouthan.blogspot.com/ (includes postings on patent themes)
LinkedIN:      http://www.linkedin.com/in/cdsouthan
Website:       http://www.cdsouthan.info/CDS_prof.htm
Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year
Citations:     http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en
Presentations: http://www.slideshare.net/cdsouthan




                                                                                     [22]

Más contenido relacionado

Similar a Exploring SAR between Patents and PubChem

Comparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between DatabasesComparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between Databases
Chris Southan
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Dr. Haxel Consult
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBL
Dr. Haxel Consult
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
George Papadatos
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
Valery Tkachenko
 
Data Management Plan in Clinical Trials.pptx
Data Management Plan in Clinical Trials.pptxData Management Plan in Clinical Trials.pptx
Data Management Plan in Clinical Trials.pptx
Siro Clinical Research Institute
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Sean Ekins
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
Kamel Mansouri
 

Similar a Exploring SAR between Patents and PubChem (20)

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Comparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between DatabasesComparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between Databases
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBL
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Data Management Plan in Clinical Trials.pptx
Data Management Plan in Clinical Trials.pptxData Management Plan in Clinical Trials.pptx
Data Management Plan in Clinical Trials.pptx
 
CDISC-CDASH
CDISC-CDASHCDISC-CDASH
CDISC-CDASH
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 

Más de Chris Southan

Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
Chris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
Chris Southan
 

Más de Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 

Último

Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Dipal Arora
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Dipal Arora
 

Último (20)

O898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadO898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur  Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Guntur  Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
 
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 

Exploring SAR between Patents and PubChem

  • 1. Using Chemicalize.org with Other Open Resources to Extract SAR from Patents and Explore Intersects in PubChem Christopher Southan ChrisDS Consulting, Göteborg, Sweden, Prepared for the ChemAxon UGM, May 2012, version 2nd May [1]
  • 2. Key Relationships in Patents and Papers MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGGA PLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGY YVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQ RQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGP NVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDD SLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASV GGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQ DLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKA ASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLM GEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSS TGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRT AAVEGPFVTLDMEDCGYNIPQTDESTLMTIAYVMAAICAL FMLPLCLMVCQWRCLRCLRQQHDDFADDISLLK Document Assay Result Compound Target Discerning and mapping these 2011 http://www.ncbi.nlm.nih.gov/pubmed/21569515 relatioshionships from documents is crucial and demanding Chemicalize.org is a 2010 http://www.citeulike.org/user/cdsouthan/article/8637426 significant advance in open chemistry extraction 2012 http://www.slideshare.net/cdsouthan/southan-bio-it2012patents [2]
  • 3. Practical Utilities • Name-to-struc (n>s) for selected or batch conversions from patents, papers, abstracts, web pages and other sources • Intersect different content at identity or similarity level • Molecular properties and bulk download • Extracted structures archived, searchable and sharable • Similarity display of analogue series from a document • Bulk upload to PubChem for intersects and triage • Result display in JChem for Excel • Can iterate with OPSIN for IUPAC fixes [3]
  • 4. Chemicalize.org Exploitation Challenges • Specific retrieval of patent or other source (e.g. target recall) • Working different sources (e.g. CiteXplore/espace/Scibite for retrieval, Google for cross-checks, WIPO for images and tables, Freepatentsonline for deeper queries) • Eyeballing original documents for relevant sections • Locating exemplified drug-relevant/lead-like structures with data links • For many patents examples >> activity data links > potent structures • Selecting best sources/family members for optimal IUPAC extraction quality (e.g. US pats and FPO) • Filtering novel structures from common chemistry • Need to be PubChem cogniscant for effective triage • For a variety of reasons some documents have low extraction rates • Tricks and work-rounds enhance exploitation [4]
  • 5. Target Recall: CiteExplore • Title only ”DPPIV” Medline = 37 Patents = 31 • Title + abstract ”DPPIV” Medline = 402 Patents = 144 • Title + abstract ”dipeptidyl peptidase” Medline = 4,838 Patents = 1,520 • Title + abstract ” inhibitor” Medline = 772,053 Patents = 124,516 • Title + abstract ” diabetes” Medline = 431,299 Patents = 36,792 • Title + abstract ”DPPIV OR dipeptidyl peptidase AND inhibitor AND diabetes” Medline = 1,105 Patents = 604 CiteXplore is restricted to EBI patent abstracts so you can get higher recall at full-text sources such as SureChemOpen, EPO/espace, WIPO and FPO (but not search Medline in parallel) [5]
  • 6. Target Alerts: SciBite US2012040982 DPPIV Boeringer Ingelheim Feb 2012 [6]
  • 7. Slicing and Dicing US2012040982 (I) • Chemicalize converted 1,390 structures from the FreePatentsOnline (FPO) URL • From the 497 examples 486 converted • Need to scan the document and iterate with scroll bar to spot lead-like structures [7]
  • 8. Slicing and Dicing US2012040982 (II) • OPSIN picks up some of what chemicalize misses (e.g. 389 above) but not all • OPSIN error reports may help fix a series for Chemicalize (e.g 1 vs. L) • Practically more important if that example has potent activity [8]
  • 9. Slicing and Dicing US2012040982 (III) • Similarity display clearly picks out the lead-like analog series (top) • Select via FPO text > example list only, > Word > PDF > chemicalize upload > SDF download 486 structures (bottom) • However, from the partial descriptions these may include prophetics • Also download 28 claimed examples via PDF [9]
  • 10. Slicing and Dicing US2012040982 (IV) • Can locate an SAR table with 11 point IC50s • But.... only 9 examples below 100 nM, example 25 is 56 nM • The designation of series 1 and 2 obfuscates their example identity [10]
  • 11. PubChem Triage of Chemicalize Output (I) • Example 25 SMILES > neither an exact match nor tautomer – thus novel • Repeat search at 95% Tanimoto > 289 neighbors > cluster • Closest PubChem analog > ChemSpider > SureChem > Novo Nordisk DPPIV patent from 2005 [11]
  • 12. PubChem Triage of Chemicalize Output (II) • Total extraction from US2012040982 > 1,390 SDFs > 1387 uploaded > 7 “failed” • 493 exact matches (= preexisting PubChem CIDs) • 486 example-only SDFs > upload > 21 exact-match CIDs • 34 claims-only give 9 exact-match CIDs, primary sources were: • 5 from ChEMBL from a Boeringer Ingleheim 2007 Publication • 7 from Thomson Pharma • 2 from ChemSpider with SureChem links to Boeringer Ingleheim patents • Thus 461 examples chemicalized from US2012040982 are “novel” structures • However, cannot check enatiomeric or tautomeric inexact matches from PubChem interface (only for existing CIDs) [12]
  • 13. PubChem Triage of Chemicalize Output (III) • Chemicalize examples-plus-claims US201204098 = 29 CIDs (search 36 above) • Thomson Pharma/Discovery gate intersect is ~ Derwent WPI (search 31) • This matched 20 from the 29 (search 36), presumably DWPI extractions • ChEMBL (7) matched 6 from 29 (i.e. extracted from papers) • SLING matched 8 from 29 (i.e. extracted from EPO patents) • It was thus possible to intersect the chemicalize extractions from this patent with four independent primary sources in PubChem from patents and publications [13]
  • 14. Patent ”Walking” from Chemicalize similarity results (I) • The similarity results from one example gave 1734 matches out to Tanimoto 0.5, extending ”beyond” the example space of US2012040982. • Scrolling these shows at Tanimoto 0.6, with shared substructures in blue, connect to a different older patent US7772226, also for DPPIV, from Eisai [14]
  • 15. Patent ”Walking” from Chemicalize similarity results (II) • US7772226 from FPO converted 1127 (i.e. more than the 992 from PatBase) • 680 matched PubChem CIDs • Example 228 CC#CCN1C(=NC2=C1C(=O)NC(OC1=CC=C(C=C1)C(=O)OC(=O)C(F)(F)F)=N2)N1CCNCC1 had a 12 nM IC50 for DPPIV • Can even ”walk” to a third DPPIV patent WO2007071738 from Novartis [15]
  • 16. Extracting from CiteXplore ChEMBL • CiteXplore lists ChEMBL IUPACs and IDs • Can chemicalize all ChEMBL structures from from one paper • Difficult to ID these in ChEMBL • Upload 8 structures to PubChem • 7 match ChEMBL IDs • Only one matches the 29 from US2012040982 • Thus paper probably from mutiple patents [16]
  • 17. Mining PubMed Central Full-text Papers (I) • Only a few examples converted direct • So > wordpad > direct chemicalize (iterate) > web page (Google sites) • Download > Upload to JChem for Excel • Add in IC50 values from paper [17]
  • 18. Mining PubMed Central Full-text Papers (II) • Add the SAR data from the paper into the structure table • These had no exact matches in PubChem [18]
  • 19. Chemicalizing the DrugBank Entry for DPPIV 41 conversions of inhbitors, many are PDB ligands [19]
  • 20. Can Even Extract Catalogues that have no SMILES or InChIs.... Tocris DPPIV inhibitor > chemicalize > PubChem > 6 analogs [20]
  • 21. Conclusions • Chemicalize.org is powerful, flexible and free, as in beer.... • Significantly enables small-scale roll-your-own patent mining • Ditto for journal article/abstract mining (e.g. for papers not captured in ChEMBL) • You still need perspicacity to discern SAR details • Complementary to commercial patent databases populated by manual extraction (e.g. you can extract more structures) • Commercial automated patent extraction databases typically combine ChemAxon n>s with other algorithms (e.g. http://www.chemaxon.com/library/benchmarking- chemaxon%E2%80%99s-name-to-structure-batch-tool-on-patent-text/) • While they thus out-perform chemicalize, it is still very useful for intersecting journal articles or other sources against any databases • Significant novel content (w.r.t. public databases) is accumulating via ”default crowdsourcing” in the chemicalize archive which becomes an important cross- check source and can be ”walked” between documents • Combined with OPSIN and OSRA structures from most sources are extractable • Synergies with sources such as PubChem, PubMed Central, ChEMBL and SureChemOpen will advance academic drug discovery and chemical biology [21]
  • 22. Questions Welcome ChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm Mobile: +46(0)702-530710 Skype: cdsouthan Email: cdsouthan – at - hotmail.com Twitter: http://twitter.com/#!/cdsouthan Blog: http://cdsouthan.blogspot.com/ (includes postings on patent themes) LinkedIN: http://www.linkedin.com/in/cdsouthan Website: http://www.cdsouthan.info/CDS_prof.htm Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year Citations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en Presentations: http://www.slideshare.net/cdsouthan [22]