SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Patent Cheminformatics

Identification of key compounds in patents


3rd Strasbourg Summer School on Chemoinformatics
Strasbourg, France, 25-29 June 2012


Dr. Sorel Muresan

AstraZeneca R&D Mölndal
Outlook



Patents, brief into

Sources for accessing full text patents

Compound extraction from patents

Key compound prediction
What is a patent?

A patent   application is an agreement between
inventor and state, allowing an inventor a monopoly
over their invention for a limited time. In the EU,
applicants are required to disclose their inventions
in a manner sufficiently clear and complete for them to be
carried out by a person skilled in the art. In the United
States, inventors are additionally required to include the
‘best mode’ of making or practicing the invention.
Patents are very interesting documents

 Three reasons why life scientists should read patents more
   frequently

      1. Some information appears earlier in patents than in scientific
         journals

      2. Patents may contain sound data that never appear in the
         literature.

      3. Patents are a source of hard-to-get information from
         commercial suppliers.




Seeber, F. Nat. Protocols 2007
Patents as pharmaceutical data source
Complementary between journals and patents
 “In certain fields, new advances are disclosed in patents long before
 they are published in peer-reviewed journals.”         Grubb. W.P.

                 “Novel Cannabinoid-1 Receptor Inverse Agonist for the Treatment of Obesity”
                                                         modulates

                                                                     CNR1


  Patent application    Patent publication            Journal publication
      Nov 2002              Mar 2004                      Dec 2006


                  ~18 months                 2.5 years
  USPTO patents (PN: US20040058820)                                            (PMID: 17181138)
Unique chemistry from patents
   Data from AstraZeneca’s Chemistry Connect




               ~6% of compound structures exemplified in patents
               were also published in journal articles


Muresan, S. et al. DDT 2011
Southan, C. et al J.Cheminfo. 2009
Anatomy of a patent

   Front page - contains a wealth of information about
     the patent

   Detailed description of the invention - the heart of a
     patent application. It generally describes one or more
     preferred embodiments of the invention in enough detail
     to enable someone of ordinary skill in the art to make or
     use the invention without having to resort to undue
     experimentation

   Claims - the most important part of the patent application.
     Define the scope of patent protection afforded to the
     owner of a patent.

C.P. Miller, M.J. Evans, The Chemist's Companion Guide to Patent Law, 2011
Searching for patents

Official public portals




Open full-text




Structure -> Patents & Patent -> Structure
Manually extracted SAR data (commercial)

• GOSTAR (GVKBIO Online Structure Activity Relationship Database) is a
  comprehensive database that captures explicit relationships between the three
  entities of publications, compounds and sequences.

• It includes 2.6 million compounds linked to 3,500 sequences with 12.5M SAR
  points extracted from 43,000 patents and 67,000 articles from 125 journals
Annotate patents manually
Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
Annotate patents manually
Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
Name-2-structure
OPSIN – http://opsin.ch.cam.ac.uk/
Chemical Named Entity Recognition (CNER)



                      N-Ethyl-p-menthane-3-carboxamide

                                         Name-to-Structure
                                         software



                       C(C)NC(=O)C1CC(CCC1C(C)C)C
                                     O


                                          N
                                          H




                     WS3 cooling agent (CAS 39711-79-0)
Compound extraction via chemical text mining
Traditional text mining pipeline




• Determining the start and end of IUPAClike names in free text is a tricky
  business.
• Chemical names can contain whitespace, hyphens, commas,
  parenthesis, brackets, braces, apostrophes, superscripts, greek
  characters, digits and periods.
• This is made harder still by typos, OCR errors, hyphenation, linefeeds,
  XML tags, line and page numbers and similar noise.
OCR Errors: Compound Names


• lH-ben zimidazole → 1H-benzimidazole

• 4- (2-ADAMANTYLCARBAM0YL) -5-TERT-BUTYL-
  PYRAZOL-1-YL] BENZOIC ACID →
  4-(2-adamantylcarbamoyl)-5-tert-butyl-pyrazol-1-
  yl]benzoic acid

• triphenylposhine → triphenylphosphine
Text Mining Conversion by Name Class
                                           ChemAxon 5.5
  Class   Category             Names       converts 60%
                                         n2s_1 (%) n2s_2 (%) n2s_3 (%)   None (%)
   M      Molecule           7,262,798       81.4       64.8      77.1        7.8
   D      Dictionary           26,876        38.1       45.1       3.5       38.5
   R      Registry number     304,064           0         0         0        100
   C      CAS number           47,815           0         0         0        100
   E      Element                 836        92.8     58.5      78.7          3.2
                                           NCI/CADD Chemical Identifier Resolver
   P      Fragment           2,663,677       72.9     58.9         0         19.8
                                                     converts 48%
   A      Atom fragment            96        90.6     36.5         0          6.3
   Y      Polymer                 295           0       44.1      22.7       36.9
   G      Generic                1263         2.6        6.3       0.5       91.9
   N      Noise                   104        32.7        24       19.2       52.9
          Total             10,307,824       76.3       61.0      54.3       14.1



Muresan, S. ChemAxon UGM 2011
Extraction compounds from US20100221398
Google Patents + Chemicalize.org (ChemAxon)
Extraction compounds from US20100221398
Google Patents + Chemicalize.org (ChemAxon)
What is a key compound?


Drug candidate

Compound(s) with optimal physicochemical properties

Compouns(s) with the most suitable pharmacokinetic
 profile

The most biologically active tool or probe
Key compounds from patents

”Old school” techniques
      Look for clues in text: “most preferred” wording
      in claims; crystal form info; scale of synthesis

Structural information alone
      Frequency of group (FOG) analysis of
      exemplified compounds

Structures and SAR data
      Work out SAR using biological data and
      structures
Exercise 2b: Predicting Key Example Compounds
Key compound prediction from patents




 Figure 2. Graphical image of patent example compounds in chemical space. Each gray circle represents an example
 compound. The black circle, square, and triangle represent key compound candidates.



     Theory: Chemists carry out extensive SAR around key compounds.
     Cluster examples and look for centres of densely populated regions


 Kazunari Hattori; Hiroaki Wakabayashi; Kenta Tamaki; J. Chem. Inf. Model. 2008, 48, 135-142.
 DOI: 10.1021/ci7002686
 Copyright © 2008 American Chemical Society
Key compound prediction from patents
  From WO1996025405 the earliest patent which claims it, can you
  work out the structure of Bextra (Valdecoxib), the Pfizer NSAID?




74 exemplified cmpds
R-group decomposition (Free-Wilson like)
FOG ranking
Key compound prediction from patents
Frequency of group (FOG) analysis


1) Extract compounds from patents
   • manual curation (GVKBIO)
   • text mining (SureChemOpen, OSCAR)

2) Define core
   • manually (e.g. from patent Markush)
   • automated core perception

3) R-group decomposition (ChemAxon’s JChem)

4) Rank compounds based on FOG (Spotfire, EXCEL)
WO1996025405 - Bextra




 Source     #compounds   Bextra   Bextra
                         exists   ranked
 GVKBIO     74           Y        1 (broad core)
                                  1 (narrow core)
 SureChem   501          Y        1 (broad core)
                                  1 (narrow core)
EP268956 - Aciphex




  Source     #compounds   Aciphex   Aciphex
                          exists    ranked
  GVKBIO     27           Y         2 (core1)
                                    1 (core2)
  SureChem   168          Y         1 (core1)
                                    1 (core2)
EP268956 - Aciphex
EP296749 - Aricept




  Source     #compounds   Aricept   Aricept
                          exists    ranked
  GVKBIO     76           Y         1

  SureChem   108          Y         1
EP296749 - Arimidex




  Source     #compounds   Arimidex   Arimidex
                          exists     ranked
  GVKBIO     70           Y          1

  SureChem   267          Y          1
WO1989006649 - Raxar




X is CH, CF, CCI, or N.


  Source             #compounds   Raxar    Raxar
                                  exists   ranked
  GVKBIO             102          Y        5

  SureChem           0            N        n/a
US3912743 – PAXIL




 Source    #compounds   Paxil    Paxil
                        exists   ranked
 GVKBIO    60           Y        54
US3912743 – PAXIL
Key compound prediction
 48 drug patents, with more than 10 compounds, including the drug


Method           Key compound   Key compound
                 as the first   within the first 5
                 ranked         ranked
                 compound       compounds

Cluster seed     11 (23%)       26 (54%)
analysis

Molecular Idol   5 (10%)        22 (46%)


Frequency of     11 (23%)       17 (35%)
group analysis
JCIM paper
http://dx.doi.org/10.1021/ci3001293
Summary



• Unique chemistry from patents (8% out of 55M parent
  structures in AstraZeneca’s Chemistry Connect)

• Public sources for full text patent access and open
  source software for chemical text mining

• Frequency of group analysis is a simple process to
  visualize and rank patent compounds
Acknowledgements

•   Jon Winter
•   Christian Tyrchan
•   Jonas Boström
•   Mats Eriksson
•   Paul Xie
•   Chris Southan

• GVKBIO
• IBM
• SureChem

• Next Move Software
• ChemAxon
• Unilever Centre for Molecular
  Science Informatics

Más contenido relacionado

La actualidad más candente

Emesis & anti emetics medications
Emesis & anti emetics medications Emesis & anti emetics medications
Emesis & anti emetics medications Dr. Rupendra Bharti
 
Laxative and antidiarrheal agents
Laxative and antidiarrheal agentsLaxative and antidiarrheal agents
Laxative and antidiarrheal agentsRahul B S
 
chronopharmacology.pptx
chronopharmacology.pptxchronopharmacology.pptx
chronopharmacology.pptxDrSeemaBansal
 
Serotonin : diseases and therapeutics
Serotonin : diseases and therapeuticsSerotonin : diseases and therapeutics
Serotonin : diseases and therapeuticsGaurav Yadav
 
prostaglandinsPG leukotrienesLT thromboxinesTX
prostaglandinsPG  leukotrienesLT thromboxinesTXprostaglandinsPG  leukotrienesLT thromboxinesTX
prostaglandinsPG leukotrienesLT thromboxinesTXDr Muhammad Mustansar
 
Types of Arrythmia and Anti Arrythmic Drugs .pptx
Types of Arrythmia and Anti Arrythmic Drugs .pptxTypes of Arrythmia and Anti Arrythmic Drugs .pptx
Types of Arrythmia and Anti Arrythmic Drugs .pptxSHASHANKMISHRA597213
 
Serotonin agonist &antagonist
Serotonin agonist &antagonistSerotonin agonist &antagonist
Serotonin agonist &antagonistShipra Jain
 
Antihypertensive drugs
Antihypertensive drugsAntihypertensive drugs
Antihypertensive drugsKarun Kumar
 
Parasympathomimetics
ParasympathomimeticsParasympathomimetics
ParasympathomimeticsAnkurJoshi62
 
Animal models of nociception (pain)
Animal models of nociception (pain)Animal models of nociception (pain)
Animal models of nociception (pain)ankit
 
Antiemetics and prokinetics by dr.roohna
Antiemetics and prokinetics by dr.roohnaAntiemetics and prokinetics by dr.roohna
Antiemetics and prokinetics by dr.roohnaDr Roohana Hasan
 

La actualidad más candente (20)

Thyroid pharmacology
Thyroid pharmacologyThyroid pharmacology
Thyroid pharmacology
 
Corticóides uni foa
Corticóides   uni foaCorticóides   uni foa
Corticóides uni foa
 
pharmacology of anti histamines
pharmacology of anti histaminespharmacology of anti histamines
pharmacology of anti histamines
 
Emesis & anti emetics medications
Emesis & anti emetics medications Emesis & anti emetics medications
Emesis & anti emetics medications
 
Laxative and antidiarrheal agents
Laxative and antidiarrheal agentsLaxative and antidiarrheal agents
Laxative and antidiarrheal agents
 
chronopharmacology.pptx
chronopharmacology.pptxchronopharmacology.pptx
chronopharmacology.pptx
 
Serotonin : diseases and therapeutics
Serotonin : diseases and therapeuticsSerotonin : diseases and therapeutics
Serotonin : diseases and therapeutics
 
prostaglandinsPG leukotrienesLT thromboxinesTX
prostaglandinsPG  leukotrienesLT thromboxinesTXprostaglandinsPG  leukotrienesLT thromboxinesTX
prostaglandinsPG leukotrienesLT thromboxinesTX
 
Opioides (1)ppt
Opioides (1)pptOpioides (1)ppt
Opioides (1)ppt
 
Types of Arrythmia and Anti Arrythmic Drugs .pptx
Types of Arrythmia and Anti Arrythmic Drugs .pptxTypes of Arrythmia and Anti Arrythmic Drugs .pptx
Types of Arrythmia and Anti Arrythmic Drugs .pptx
 
Serotonin agonist &antagonist
Serotonin agonist &antagonistSerotonin agonist &antagonist
Serotonin agonist &antagonist
 
Histamine and its antagonists
Histamine and its antagonistsHistamine and its antagonists
Histamine and its antagonists
 
Antihypertensive drugs
Antihypertensive drugsAntihypertensive drugs
Antihypertensive drugs
 
Aula_Lei 3820.ppt
Aula_Lei 3820.pptAula_Lei 3820.ppt
Aula_Lei 3820.ppt
 
Parasympathomimetics
ParasympathomimeticsParasympathomimetics
Parasympathomimetics
 
Adaptogens
AdaptogensAdaptogens
Adaptogens
 
Histamine
HistamineHistamine
Histamine
 
Animal models of nociception (pain)
Animal models of nociception (pain)Animal models of nociception (pain)
Animal models of nociception (pain)
 
Prostaglandins & leukotrienes
Prostaglandins & leukotrienesProstaglandins & leukotrienes
Prostaglandins & leukotrienes
 
Antiemetics and prokinetics by dr.roohna
Antiemetics and prokinetics by dr.roohnaAntiemetics and prokinetics by dr.roohna
Antiemetics and prokinetics by dr.roohna
 

Destacado

Alcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerAlcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerSorel Muresan
 
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALAkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALSorel Muresan
 
Micellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate SurfactantsMicellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate SurfactantsJenny Chen
 
Nano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis ReportNano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis ReportGridlogics
 
NANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELLNANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELLVishal Singh
 
presentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELLpresentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELLcutejuhi
 
Fuel cell seminar
Fuel cell seminarFuel cell seminar
Fuel cell seminarBipin Gupta
 
Nano Tech Fuel Presentation
Nano Tech Fuel PresentationNano Tech Fuel Presentation
Nano Tech Fuel PresentationCaptNano
 
Nano fuel cell
Nano fuel cellNano fuel cell
Nano fuel cellRaj Kumar
 
Electric drives
Electric drivesElectric drives
Electric drivesSamsu Deen
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer IntroductionGridlogics
 
Fuel cells presentation
Fuel cells presentationFuel cells presentation
Fuel cells presentationVaibhav Chavan
 
Power electronic drives ppt
Power electronic drives pptPower electronic drives ppt
Power electronic drives pptSai Manoj
 
Electrical ac & dc drives ppt
Electrical ac & dc drives pptElectrical ac & dc drives ppt
Electrical ac & dc drives pptchakri218
 
POWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICESPOWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICESshazaliza
 
Electric drives
Electric drivesElectric drives
Electric drivesraj_e2004
 

Destacado (19)

Alcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerAlcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymer
 
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALAkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
 
Micellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate SurfactantsMicellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate Surfactants
 
Nano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis ReportNano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis Report
 
NANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELLNANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELL
 
presentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELLpresentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELL
 
Fuel cell seminar
Fuel cell seminarFuel cell seminar
Fuel cell seminar
 
Nano Tech Fuel Presentation
Nano Tech Fuel PresentationNano Tech Fuel Presentation
Nano Tech Fuel Presentation
 
Nano fuel cell
Nano fuel cellNano fuel cell
Nano fuel cell
 
Polyfuse
PolyfusePolyfuse
Polyfuse
 
Fuel Cells
Fuel CellsFuel Cells
Fuel Cells
 
Electric drives
Electric drivesElectric drives
Electric drives
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer Introduction
 
Fuel cells presentation
Fuel cells presentationFuel cells presentation
Fuel cells presentation
 
Power electronic drives ppt
Power electronic drives pptPower electronic drives ppt
Power electronic drives ppt
 
Fuel cells
Fuel cellsFuel cells
Fuel cells
 
Electrical ac & dc drives ppt
Electrical ac & dc drives pptElectrical ac & dc drives ppt
Electrical ac & dc drives ppt
 
POWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICESPOWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICES
 
Electric drives
Electric drivesElectric drives
Electric drives
 

Similar a Patent Cheminformatics: Identification of key compounds in patents

Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Chris Southan
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSGeorge Papadatos
 
Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research DataChris Southan
 
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Hitesh Patel
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...Dr. Haxel Consult
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3glennfish
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Kamel Mansouri
 
Fragment Based Drug Discovery
Fragment Based Drug DiscoveryFragment Based Drug Discovery
Fragment Based Drug DiscoveryAnthony Coyne
 
Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Frank Oellien
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology Sean Ekins
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSopen_phacts
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...Kamel Mansouri
 
Polymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case studyPolymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case studyJordi Labs
 

Similar a Patent Cheminformatics: Identification of key compounds in patents (20)

Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research Data
 
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3
 
foglar book.pdf
foglar book.pdffoglar book.pdf
foglar book.pdf
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
 
Fragment Based Drug Discovery
Fragment Based Drug DiscoveryFragment Based Drug Discovery
Fragment Based Drug Discovery
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTS
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
Chemical abstracts ii
Chemical abstracts   iiChemical abstracts   ii
Chemical abstracts ii
 
Polymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case studyPolymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case study
 
Checking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying ChemistryChecking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying Chemistry
 

Más de Sorel Muresan

Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Sorel Muresan
 
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Sorel Muresan
 
Berol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningBerol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningSorel Muresan
 
Multifunctional hydrotropes
Multifunctional hydrotropesMultifunctional hydrotropes
Multifunctional hydrotropesSorel Muresan
 
Thickening with cationic surfactants
Thickening with cationic surfactantsThickening with cationic surfactants
Thickening with cationic surfactants Sorel Muresan
 
Challenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelChallenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelSorel Muresan
 
Berol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightBerol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightSorel Muresan
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsSorel Muresan
 
Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Sorel Muresan
 

Más de Sorel Muresan (10)

Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
 
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
 
Berol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningBerol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaning
 
Multifunctional hydrotropes
Multifunctional hydrotropesMultifunctional hydrotropes
Multifunctional hydrotropes
 
Thickening with cationic surfactants
Thickening with cationic surfactantsThickening with cationic surfactants
Thickening with cationic surfactants
 
Challenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelChallenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobel
 
Berol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightBerol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delight
 
Chemistry Connect
Chemistry ConnectChemistry Connect
Chemistry Connect
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
 
Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...
 

Último

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Último (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Patent Cheminformatics: Identification of key compounds in patents

  • 1. Patent Cheminformatics Identification of key compounds in patents 3rd Strasbourg Summer School on Chemoinformatics Strasbourg, France, 25-29 June 2012 Dr. Sorel Muresan AstraZeneca R&D Mölndal
  • 2. Outlook Patents, brief into Sources for accessing full text patents Compound extraction from patents Key compound prediction
  • 3. What is a patent? A patent application is an agreement between inventor and state, allowing an inventor a monopoly over their invention for a limited time. In the EU, applicants are required to disclose their inventions in a manner sufficiently clear and complete for them to be carried out by a person skilled in the art. In the United States, inventors are additionally required to include the ‘best mode’ of making or practicing the invention.
  • 4. Patents are very interesting documents Three reasons why life scientists should read patents more frequently 1. Some information appears earlier in patents than in scientific journals 2. Patents may contain sound data that never appear in the literature. 3. Patents are a source of hard-to-get information from commercial suppliers. Seeber, F. Nat. Protocols 2007
  • 5. Patents as pharmaceutical data source Complementary between journals and patents “In certain fields, new advances are disclosed in patents long before they are published in peer-reviewed journals.” Grubb. W.P. “Novel Cannabinoid-1 Receptor Inverse Agonist for the Treatment of Obesity” modulates CNR1 Patent application Patent publication Journal publication Nov 2002 Mar 2004 Dec 2006 ~18 months 2.5 years USPTO patents (PN: US20040058820) (PMID: 17181138)
  • 6. Unique chemistry from patents Data from AstraZeneca’s Chemistry Connect ~6% of compound structures exemplified in patents were also published in journal articles Muresan, S. et al. DDT 2011 Southan, C. et al J.Cheminfo. 2009
  • 7. Anatomy of a patent Front page - contains a wealth of information about the patent Detailed description of the invention - the heart of a patent application. It generally describes one or more preferred embodiments of the invention in enough detail to enable someone of ordinary skill in the art to make or use the invention without having to resort to undue experimentation Claims - the most important part of the patent application. Define the scope of patent protection afforded to the owner of a patent. C.P. Miller, M.J. Evans, The Chemist's Companion Guide to Patent Law, 2011
  • 8.
  • 9.
  • 10.
  • 11. Searching for patents Official public portals Open full-text Structure -> Patents & Patent -> Structure
  • 12. Manually extracted SAR data (commercial) • GOSTAR (GVKBIO Online Structure Activity Relationship Database) is a comprehensive database that captures explicit relationships between the three entities of publications, compounds and sequences. • It includes 2.6 million compounds linked to 3,500 sequences with 12.5M SAR points extracted from 43,000 patents and 67,000 articles from 125 journals
  • 13. Annotate patents manually Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
  • 14. Annotate patents manually Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
  • 16. Chemical Named Entity Recognition (CNER) N-Ethyl-p-menthane-3-carboxamide Name-to-Structure software C(C)NC(=O)C1CC(CCC1C(C)C)C O N H WS3 cooling agent (CAS 39711-79-0)
  • 17. Compound extraction via chemical text mining Traditional text mining pipeline • Determining the start and end of IUPAClike names in free text is a tricky business. • Chemical names can contain whitespace, hyphens, commas, parenthesis, brackets, braces, apostrophes, superscripts, greek characters, digits and periods. • This is made harder still by typos, OCR errors, hyphenation, linefeeds, XML tags, line and page numbers and similar noise.
  • 18. OCR Errors: Compound Names • lH-ben zimidazole → 1H-benzimidazole • 4- (2-ADAMANTYLCARBAM0YL) -5-TERT-BUTYL- PYRAZOL-1-YL] BENZOIC ACID → 4-(2-adamantylcarbamoyl)-5-tert-butyl-pyrazol-1- yl]benzoic acid • triphenylposhine → triphenylphosphine
  • 19. Text Mining Conversion by Name Class ChemAxon 5.5 Class Category Names converts 60% n2s_1 (%) n2s_2 (%) n2s_3 (%) None (%) M Molecule 7,262,798 81.4 64.8 77.1 7.8 D Dictionary 26,876 38.1 45.1 3.5 38.5 R Registry number 304,064 0 0 0 100 C CAS number 47,815 0 0 0 100 E Element 836 92.8 58.5 78.7 3.2 NCI/CADD Chemical Identifier Resolver P Fragment 2,663,677 72.9 58.9 0 19.8 converts 48% A Atom fragment 96 90.6 36.5 0 6.3 Y Polymer 295 0 44.1 22.7 36.9 G Generic 1263 2.6 6.3 0.5 91.9 N Noise 104 32.7 24 19.2 52.9 Total 10,307,824 76.3 61.0 54.3 14.1 Muresan, S. ChemAxon UGM 2011
  • 20. Extraction compounds from US20100221398 Google Patents + Chemicalize.org (ChemAxon)
  • 21. Extraction compounds from US20100221398 Google Patents + Chemicalize.org (ChemAxon)
  • 22. What is a key compound? Drug candidate Compound(s) with optimal physicochemical properties Compouns(s) with the most suitable pharmacokinetic profile The most biologically active tool or probe
  • 23. Key compounds from patents ”Old school” techniques Look for clues in text: “most preferred” wording in claims; crystal form info; scale of synthesis Structural information alone Frequency of group (FOG) analysis of exemplified compounds Structures and SAR data Work out SAR using biological data and structures
  • 24. Exercise 2b: Predicting Key Example Compounds Key compound prediction from patents Figure 2. Graphical image of patent example compounds in chemical space. Each gray circle represents an example compound. The black circle, square, and triangle represent key compound candidates. Theory: Chemists carry out extensive SAR around key compounds. Cluster examples and look for centres of densely populated regions Kazunari Hattori; Hiroaki Wakabayashi; Kenta Tamaki; J. Chem. Inf. Model. 2008, 48, 135-142. DOI: 10.1021/ci7002686 Copyright © 2008 American Chemical Society
  • 25. Key compound prediction from patents From WO1996025405 the earliest patent which claims it, can you work out the structure of Bextra (Valdecoxib), the Pfizer NSAID? 74 exemplified cmpds
  • 28. Key compound prediction from patents Frequency of group (FOG) analysis 1) Extract compounds from patents • manual curation (GVKBIO) • text mining (SureChemOpen, OSCAR) 2) Define core • manually (e.g. from patent Markush) • automated core perception 3) R-group decomposition (ChemAxon’s JChem) 4) Rank compounds based on FOG (Spotfire, EXCEL)
  • 29. WO1996025405 - Bextra Source #compounds Bextra Bextra exists ranked GVKBIO 74 Y 1 (broad core) 1 (narrow core) SureChem 501 Y 1 (broad core) 1 (narrow core)
  • 30. EP268956 - Aciphex Source #compounds Aciphex Aciphex exists ranked GVKBIO 27 Y 2 (core1) 1 (core2) SureChem 168 Y 1 (core1) 1 (core2)
  • 32. EP296749 - Aricept Source #compounds Aricept Aricept exists ranked GVKBIO 76 Y 1 SureChem 108 Y 1
  • 33. EP296749 - Arimidex Source #compounds Arimidex Arimidex exists ranked GVKBIO 70 Y 1 SureChem 267 Y 1
  • 34. WO1989006649 - Raxar X is CH, CF, CCI, or N. Source #compounds Raxar Raxar exists ranked GVKBIO 102 Y 5 SureChem 0 N n/a
  • 35. US3912743 – PAXIL Source #compounds Paxil Paxil exists ranked GVKBIO 60 Y 54
  • 37. Key compound prediction 48 drug patents, with more than 10 compounds, including the drug Method Key compound Key compound as the first within the first 5 ranked ranked compound compounds Cluster seed 11 (23%) 26 (54%) analysis Molecular Idol 5 (10%) 22 (46%) Frequency of 11 (23%) 17 (35%) group analysis
  • 39. Summary • Unique chemistry from patents (8% out of 55M parent structures in AstraZeneca’s Chemistry Connect) • Public sources for full text patent access and open source software for chemical text mining • Frequency of group analysis is a simple process to visualize and rank patent compounds
  • 40. Acknowledgements • Jon Winter • Christian Tyrchan • Jonas Boström • Mats Eriksson • Paul Xie • Chris Southan • GVKBIO • IBM • SureChem • Next Move Software • ChemAxon • Unilever Centre for Molecular Science Informatics