SlideShare una empresa de Scribd logo
1 de 66
Descargar para leer sin conexión
Digging bioactive chemistry out of
patents using open resources
• While the raison d'être of patents is Intellectual Property (IP) there is a
growing awareness of the scientific value of their data content. This is
particularly so in medicinal chemistry and associated bioactivity domains
where disclosed compounds and associated data not only exceeds that
published in papers by several-fold and surfaces years earlier, but is also,
paradoxically; completely open (i.e. no paywalls). Scientists have
traditionally extracted their own relationships or used commercial sources
but the last few years have seen a “big bang” in patent extractions
submitted to open databases, including over 20 million structures now in
PubChem.
1
Outline
• Statistics of patent chemistry in various sources
• Open resources, databases and tools
• Target identification
• Bioactivity and SAR extraction
• Connecting these relationships to papers
• Medicinal chemistry patent mining
• Exercises using antimalarial research as examples
• Complementarity with commercial resources.
• Competitive Intelligence
N.b. not in scope just now, web services, APIs, RDF or SAR modelling per se
This is a suggested list that can be extended to related topics attendees
would like to cover (at least if within the cognisance of the presenter!)
2
3
Preamble
Biog
4
Chris Southan joined the IUPHAR/BPS Guide to Pharmacology database
curation team as Senior Cheminformatican in 2013. Previously he was a
Drug Discovery Consultant at TW2Informatics in Göteborg Sweden, working
on patent informatics. Prior to this he was a contractor for AstraZeneca
Knowledge Engineering, 2009-2011 working on Chemistry Connect and
Pharma Connect. Earlier positions include the ELIXIR Database Provider
Survey for the EBI (2008-9), Principle Scientist and Bioinformatics Team
Leader at AstraZeneca (2004-7) and senior bioinformatics positions in
Oxford Glycosciences (2002-3) Gemini Genomics (2001) and SmithKline
Beecham (1987–2000). He has a PhD from the University of Munich, M.Sc.
in Virology from Reading University and a B.Sc.Hons. in Biochemistry from
Dundee University. Further information on LinkedIN
IUPHAR/BPS Guide to PHARMACOLOGY
Publications: PubMed ORCHID ID 0000-0001-9580-0446
Blog: Bio < > Chem
Presentations: Slideshare
Twitter: https://twitter.com/cdsouthan
TW2Informatics: https://sites.google.com/view/tw2informatics/home
Audience assumptions
• Some familiarity with SAR distillation from the literature
• Many of you could extract examples from a patent by hand
• Database cognisance, including PubMed and PubChem (SID, CID)
• More interest in recent than historical SAR
• Not obsessively concerned with false-negatives (i.e. missed data)
• Not greatly perturbed by the fuzziness of public sources (that you might
grumble about for commercial ones)
• Familiar with the mess of patent families and Kind codes
• Familiar with protein names and identifiers
• Familiar with obfuscation that can confound SAR extraction
• Focused on Med Chem for human diseases
• Most of this tutorial could apply across other domains (e.g. IPC code
A01N for pesticides and herbicides)
• No boundaries between Drug Discovery and Chemical Biology
• Aware academic Drug Discovery is accelerating relative to commercial
5
References (I)
6
Chapter in: Samuel Chackalamannil, Rotella and Ward, (eds.) Comprehensive
Medicinal Chemistry III vol. 3, pp. 464–487. Oxford: Elsevier.
http://dx.doi.org/10.1016/B978-0-12-409547-2.13814-4, ISBN: 9780128032008
https://www.ncbi.nlm.nih.gov/pubmed/26194581
References (II)
7
https://www.ncbi.nlm.nih.gov/pubmed/23506624
https://www.ncbi.nlm.nih.gov/pubmed/23618056
https://www.ncbi.nlm.nih.gov/pubmed/24204758
Core assumption: can we believe patent SAR results?
• We know the data has value but difficult to extrinsically asses quality
• As for other domains, Med Chem has an experimental reproducibility crisis
• This reflects equivocality w.r.t. antibodies, cell lines and chemistry (e.g.
supplier purity and probes vs PAINS)
• For patents high-replicate error ranges are rarely included
• Re-synthesis fidelity also rarely reported (ever?)
• Cf. “Dispensing processes impact apparent biological activity as determined
by computational and statistical analyses” (PMID 23658723)
• We could hope that internal relative SAR across a series is more consistent
than externally comparative absolute numbers
• We know some inventor teams are world-class, well cited medicinal chemists
but can we assess the less famous?
• The same QC considerations apply to papers
• ChEMBL surfaces the worryingly wide IC50/Ki/Kd ranges on nominally same
assays from different papers
• We can also intersect some patent and paper values
• Is the internal consistency of patent-derived SAR models a useful QC?
8
Introductory example
9
• 138 detailed descriptions of the series
• WO2013083991 SureChEMBL- PubChem
• IC50 cross-reactivity data from no less
than five cell-based enzyme assays
• Human NMT1 (P30419), human NMT2
(O60551) Plasmodium vivax (A5K1A2)
Plasmodium falciparum (Q8ILW6) and
Leishmania donovani (Q8ILW6)
• https://cdsouthan.blogspot.se/2013/07/n-
myristoyltransferase-patent-and-pdb.html
10
Stats
So how much useful SAR is in the patent corpus?
• Definition for SAR: Bioactivity assay "A" (e.g. for an enzyme) with a
quantitative result "R" (e.g. an IC50) for a compound "C" (defined
chemical structure) as an activity modulator (e.g. inhibition) of protein
target "P“ (also for cellular targets e.g. antinfectives)
• A useful shorthand for this mapping is “D-A-R-C-P”
• Excelra (ex GVKBIO) provides good statistical starting point
• https://www.slideshare.net/cdsouthan/largescale-curation-of-
bioactive-chemistry-from-patents-and-papers
• April 2017 numbers were 1.34 mill cpds from 112K papers and 3.35
mill from 71K patents, 0.18 million overlap
• From the earlier PMID 24204758, 12 cpds/paper and 46/patent
• Human protein targets 3383 in former 2431 in latter, 3882 combined
with 546 patent-only
• The Excelra absolute activity numbers dependent on their capping
rules for binned data (e.g. IC50 between 10 and 100nM)
• Binned data still useful for modelling
• Where are the enzyme activators?
11
Independent estimates of SAR total
• WIPO PATENTSCOPE A61 and C07 PCTs = 93,253
• Not all have SAR data from novel composition of matter first-filings
• Many will be “secondary” filings (e.g. synthesis and/or crystallisation)
• Generic companies file many of these for de-risked cpds
• Some first-filings for a chemotype series may not have any activity
data disclosed (stats unknown)
• We can thus assume that extractable SAR from med chem patents in
the last five years may be only 30- 50,000 documents
• Guestimate: ~50K patents ~ 3.50 million bioactive structures (c.f.
Excelra 3.35 million)
• Asian patents under-represented? (i.e. are we missing unique
structures & SAR)
12
BindingDB public SAR curation:
useful benchmark extraction stats
• Patents: 1,879
• Binding measurements: 199,588
• Compounds: 132,170
• Target proteins: 1,225
• Assays: 2,668
• Average Number of Targets per Patent: 1.95
• Usually primary plus a specificity paralogue cross-screen
• ~70 compounds/patent
• ~100 affinity measurements/patent
Data courtesy of Tiqing Liu and Michael Gilson, Oct 2017
13
Patent chemistry stats inside PubChem
14
The three major CNER sources inside PubChem
15
IBM = 10.7
SCRIPDB = 4.0
SureChEMBL = 17.6
2.9
2.4
4.7 10.1
0.6 0.4
0.50
Counts (Oct 2017)
are CIDs in millions
Union = 21.7
3-way = 2.4
3 + 2-way = 8.1
Unique= 13.5
Raises questions about
corroboration vs divergence
The chemistry stats: wheat vs chaff
• If we except a certain proportion of binned data as useful, the max
SAR we could expect to align is ~ 3 to 4 mill strutures
• But how can we select these from the 22 million (and climbing) in
PubChem?
• The easiest way is to come in from the literature with clean structures
• This can expand the SAR around a target anywhere from 2 to 10-fold
• But have an unknown statistic; what proportion of good patent SAR
sets, including for novel targets, never get into a paper? (examples
anyone?)
• Exelra have some relevant stats on this – does anyone else?
16
17
Sources
Sources offer a broad spectrum of utilities
• Connecting to patents via structures from papers
• Connecting via targets and/or diseases from papers
• Proximity “Walking” doc <>doc, target <> target, struc <> struc
• Finding patents via metadata (e.g. assignee, target and date)
• Viewing chemistry content in the document
• Establishing if the document has useful SAR
• Finding which sources have extracted chemistry
• Mapping the structures to the activity values
• DIY extraction of structures not yet in a source (e.g. images and/or
IUPAC strings
• Collating an SAR table
• Best to get familiar with in-depth functionality of a few sources
• Many roads lead to Rome so difficult to know which is most efficient
• I certainly have not tried all those of probable utility
18
Source : BindingDB target-mapped SAR extraction
19
BindingDB
• Pre-cooked expert curation
• Modest but steady growth
• Easy to browse list
• Structures > PubChem and subsumed > ChEMBL
• Targets mapped to UniProt even for titles with no target
• Many search features, some unique (i.e. different to ChEMBL)
• Novel targets from patents and unique journal selection
• Download full SAR sets example no. > structure > activity > target
• Lag time in PubChem indexing
• No antinfective whole organism targets
• US publications some years behind the WO first pubs
• Dependent on CWU structures that are not all correct
20
Source: WIP0 PATENTSCOPE
21
WIP0 PATENTSCOPE
• Comprehensive and up to date
• Instant metrics (yellow highlight) as you toggle search parameters
• Sign in for saved searches
• Useful instant graphics on result lists
• Search reports can “walk” you to other relevant filings
• Pithy examiner comments (almost) amusing
• Limited text search fields
• In-line table images, pros and cons
• Slow image loading
• Inventor/applicant conflation
22
The WIPO “gift horse”
23
• ~ 7 million strucs, WO and US from 1978
• WIPO collab w. InfoChem and NextMove
• False-negatives (i.e. examples missed)
• Not yet in PubChem
• Limited utility for SAR mining so far
Source: EPO Espace
24
I prefer WIPO as a search portal but Espace is useful for INPADOC families
Source: SureChEMBL
25
SureChEMBL
• For SAR extraction the best first-stop-shop (after BindingDB)
• Chemistry indexed a week or less from publication date
• Family-wide structure downloads
• Powerful combination of filters and search functionality
• Multiple source x-refs including PubChem and ChEMBL
• Can correct IUPAC failures and paste out example blocks
• Usual caveats of CNER (but hey, 18 mill structures for free)
• Extraction confounded by dense image tables
• WIPOs less well extracted that USPTOs (but OCR not their fault)
• Overhead of futile common chemistry extraction
• Slow image load times and structure step-through
• Need to watch PubChem load dates(via SIDs)
• The feature that never appeared :(
26
Source: PubChem
27
PubChem
• Mother of all searchable portals with 22 mill patent compounds
• SureChEMBL, ChEMBL, BindingDB and IBM are in it
• Massive feature set including Entrez
• Patent and PubMed connectivity via structure
• Very useful Identifier Exchange Service for set mapping
• Can upload SD files (e.g. from Chemicalize or SciFinder)
• Transparent and navigable chemistry rules (e.g. “same connectivity”)
• Slice ‘n dice full Boolean search history
• Extensive filter options
• Direct Venn from CID lists < 10,000
• Similar compound clustering > isolate an SAR series
• Can “walk” though chemical neighbourhood > cluster > cluster patent hop
by chemotype (target neutral)
• Navigation can be daunting
• Some large sources should be kicked out IMHO
• Interface queries often time out
28
New search interface includes patents
29
Source: ChEMBL
30
ChEMBL
• Gateway to chemistry manually extracted from journals
• 0.39 mill structure mapped across to SureChEMBL
• This gives direct journal < > patent connectivity
• Powerful query, filtration, browsing and target indexing
• Release 23 has1.02 mill structures and assay data from 67,722 papers
• Circular subsumation of 0.5 mill structures from confirmed PubChem
Bioassays
• Integrates the BindingDB patent curation (but sync lag)
• Indexed in PubChem BioAssay
• Target-linked entries subsumed
into BindingDB
• Linked to EPMC
• Good for paper <> patent
• Not linked to PubMed
• Up to 2 year lag for papers
• Selective journal capture
31
0.39
mill
ChEMBL
1.34 mill
SureChEMBL
17.23 mill
Source: Europe PubMed Central
32
Europe PubMed Central
• Fully featured literature search functionality
• Big plus is the (HAS_CHEMBL:y) select for chemistry
• Gives query > paper > ChEMBL chemistry > SureChEMBL and/or >
PubChem
• Bioentity mark up from other sources
• De facto two-stop shop with PubMed which has different functionality
• Warning, their patent abstracts not updated since 2012
33
Source: PubMed
34
PubMed
• Largest entry point to connect Med Chem papers < > patents
• Entities disclosed, ie target protein IDs , affiliations, chemical structure
• Power of Entrez, including MeSH
• PubMed > PDB (via MMD) good for CID of ligands > patent
• Can connect inventors with unusual names
• However, papers typically ~ 2 years behind “fresh” patents
• May find enough SAR for popular targets not to bother with patents
• But (unless paper in ChEMBL) you may have to DIY extract entities
including chemistry
• Patents good at citing papers (US mandated to be thorough)
• However, many authors avoid citing their patents
• Connect into literature via targets and diseases and thence > patents
• JFTR disease searching in patent text largely useless (titles maybe)
• Patent reviews valuable but tend to be in hard-to-get journals
35
Patent review articles: doing the groundwork for you
36
Unusual
linking
37
PubMed > PubChem > Guide to Pharmacology BACE2 page
38
Source: Open
Google
39
Date cutting e.g.
by one year,
actually works
Source:
Google patents
40
41
Targets
Status of human targets from open sources
(as UniProt x-refs)
42
Oct 2016 Oct 2017
• Most of have chemistry > target via papers (thus can search patents)
• Outer limit of data-supported druggable proteome
• Some patent only in BindingDB
Patent retrieval by target names: not so easy
43
Patent retrieval by target names
44
In : Lecture Notes in
Bioinformatics (ISBN
978-3-642-15119-4) P
Lambrix and G Kemp
(Eds.) Springer Verlag,
pp 106-121, 2010
Classification of target names in titles
45
AWK Gene and protein names can be noisy and inconsistently used
by applicants but HGN approved symbol usage seems to be improving
The three levels of title
46
47
Tools
Utility of tools
• Can re-run IUPACs and images where automated
conversion failed
• Synergies of gap filling from working between the original
document, the SureChEMBL output and the OPSIN and
OSRA tools
• Can run on PubMed abstracts, individually or bulk
• Can isolate example series of structures that has the SAR
• Useful for extraction from papers not in ChEMBL
• May be necessary to convert between formats e.g. for
uploading to PubChem
48
Simple PubMed search
49
Venny
• Excellent for set comparisons of any strings < over 10,000
• E.g. CIDs, InChIKeys or UniProt IDs
• It automatically de-duplicates
• Download complete intersects and diffs from any segment of the Venn
50
PubChem structure search
51
PubChem Identifier Exchange
52
OpenBabel
53
Format conversions e.g. SciFinder SDF to InChIKey
Example of coverage from US9181236
54
• 173 BindingDB CIDs
curated from PubChem via
US9181236
• 405 substances SDF from
SciFinder OpenBabel > 391
IK > 362 CIDs
• 1657 rows > 834
SureChEMBL IDs > 664
CIDs
• 3-way Venn of CIDs
Chemicalize.org from ChemAxon
55
Chemicalize Google patent webpage result
56
OPSIN for IUPAC names
57
• Conversion of compound 19 from WO2016096979 after fixing OCR errors
• N-r3-r(26',3i?)-5-amino-2-methyl-3-(trifluoromethyl)-3,4- dihvdropyrrol-2-yl1-
4-fluoro-phenyl1-5-chloro-pyridine-2-carboxamide.
• Good for iterative correction via error flagging that Chemicalize will not
58
Result examples
Getting SAR out the hard way
59
Collating the hard way
• Three versions of the SAR table from WO2016096979
• On the left is the original from page 64 of the PDF
• In the centre is the corresponding section of the SureChEMBL mark-up
• The right hand panel is an Excel paste-across of the centre section
• But you have to complete by pasting SMILES of structures on previous page
60
Getting SAR out the easy, via BindingDB
61
62
So what's next?
Wish list
Yup, we can dig a lot of SAR out of patents
But wouldn’t it be nice if…..
• Clavariate re-instated the Derwent patent chemistry feed to PubChem
• Open standard SAR modelling tools (with AI natch’) maybe Knime?
(table in > model out)
• These might show large patent SAR sets better than from papers
• Someone indexed full text patents by gene name counts inside the
description section (SureChEMBL for OpenPhacts?)
• SureChEMBL would finally bring in their document section stats
• Run the SureChEMBL engine on full-text papers and PubMed abstracts
• European PubMed Central updated their EPO C07/A61 patent abstracts
from 2012
• We could paste large text chunks > Chemicalize but not run out of points
• Patents could be more like good papers….
63
Could the future be automatic?
64
https://www.slideshare.net/NextMoveSoftware
Impressive results
65
66
That’s it for now

Más contenido relacionado

La actualidad más candente

II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchII-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchDr. Haxel Consult
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...Dr. Haxel Consult
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy Dr. Haxel Consult
 
ICIC 2014 New Product Introduction Wiley
ICIC 2014 New Product Introduction WileyICIC 2014 New Product Introduction Wiley
ICIC 2014 New Product Introduction WileyDr. Haxel Consult
 
ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon Dr. Haxel Consult
 
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingII-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingDr. Haxel Consult
 
II-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirII-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirDr. Haxel Consult
 
ICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemDr. Haxel Consult
 
ICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockDr. Haxel Consult
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLDr. Haxel Consult
 
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...Dr. Haxel Consult
 
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaDr. Haxel Consult
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openlyFAIRDOM
 

La actualidad más candente (20)

II-SDV 2016 Linguamatics
II-SDV 2016 LinguamaticsII-SDV 2016 Linguamatics
II-SDV 2016 Linguamatics
 
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchII-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
ICIC 2014 New Product Introduction Wiley
ICIC 2014 New Product Introduction WileyICIC 2014 New Product Introduction Wiley
ICIC 2014 New Product Introduction Wiley
 
ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingII-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
 
II-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirII-SDV 2016 Questel Intellixir
II-SDV 2016 Questel Intellixir
 
ICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChem
 
II-SDV 2016 Minesoft
II-SDV 2016 MinesoftII-SDV 2016 Minesoft
II-SDV 2016 Minesoft
 
Presentation of ChemSPider at PubChem Public Meeting
Presentation of ChemSPider at PubChem Public MeetingPresentation of ChemSPider at PubChem Public Meeting
Presentation of ChemSPider at PubChem Public Meeting
 
ICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian RadestockICIC 2013 Conference Proceedings Sebastian Radestock
ICIC 2013 Conference Proceedings Sebastian Radestock
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBL
 
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
 
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre SyngentaICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
ICIC 2013 Conference Proceedings Nicolas Lalyre Syngenta
 
Publishing data and code openly
Publishing data and code openlyPublishing data and code openly
Publishing data and code openly
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 

Similar a ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open resources

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsDr. Haxel Consult
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataChris Southan
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horseChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsChris Southan
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsChris Southan
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSopen_phacts
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbChris Southan
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistrySunghwan Kim
 

Similar a ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open resources (20)

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveats
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases? Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases?
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTS
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistry
 

Más de Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Más de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Último

GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Último (20)

Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 

ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open resources

  • 1. Digging bioactive chemistry out of patents using open resources • While the raison d'être of patents is Intellectual Property (IP) there is a growing awareness of the scientific value of their data content. This is particularly so in medicinal chemistry and associated bioactivity domains where disclosed compounds and associated data not only exceeds that published in papers by several-fold and surfaces years earlier, but is also, paradoxically; completely open (i.e. no paywalls). Scientists have traditionally extracted their own relationships or used commercial sources but the last few years have seen a “big bang” in patent extractions submitted to open databases, including over 20 million structures now in PubChem. 1
  • 2. Outline • Statistics of patent chemistry in various sources • Open resources, databases and tools • Target identification • Bioactivity and SAR extraction • Connecting these relationships to papers • Medicinal chemistry patent mining • Exercises using antimalarial research as examples • Complementarity with commercial resources. • Competitive Intelligence N.b. not in scope just now, web services, APIs, RDF or SAR modelling per se This is a suggested list that can be extended to related topics attendees would like to cover (at least if within the cognisance of the presenter!) 2
  • 4. Biog 4 Chris Southan joined the IUPHAR/BPS Guide to Pharmacology database curation team as Senior Cheminformatican in 2013. Previously he was a Drug Discovery Consultant at TW2Informatics in Göteborg Sweden, working on patent informatics. Prior to this he was a contractor for AstraZeneca Knowledge Engineering, 2009-2011 working on Chemistry Connect and Pharma Connect. Earlier positions include the ELIXIR Database Provider Survey for the EBI (2008-9), Principle Scientist and Bioinformatics Team Leader at AstraZeneca (2004-7) and senior bioinformatics positions in Oxford Glycosciences (2002-3) Gemini Genomics (2001) and SmithKline Beecham (1987–2000). He has a PhD from the University of Munich, M.Sc. in Virology from Reading University and a B.Sc.Hons. in Biochemistry from Dundee University. Further information on LinkedIN IUPHAR/BPS Guide to PHARMACOLOGY Publications: PubMed ORCHID ID 0000-0001-9580-0446 Blog: Bio < > Chem Presentations: Slideshare Twitter: https://twitter.com/cdsouthan TW2Informatics: https://sites.google.com/view/tw2informatics/home
  • 5. Audience assumptions • Some familiarity with SAR distillation from the literature • Many of you could extract examples from a patent by hand • Database cognisance, including PubMed and PubChem (SID, CID) • More interest in recent than historical SAR • Not obsessively concerned with false-negatives (i.e. missed data) • Not greatly perturbed by the fuzziness of public sources (that you might grumble about for commercial ones) • Familiar with the mess of patent families and Kind codes • Familiar with protein names and identifiers • Familiar with obfuscation that can confound SAR extraction • Focused on Med Chem for human diseases • Most of this tutorial could apply across other domains (e.g. IPC code A01N for pesticides and herbicides) • No boundaries between Drug Discovery and Chemical Biology • Aware academic Drug Discovery is accelerating relative to commercial 5
  • 6. References (I) 6 Chapter in: Samuel Chackalamannil, Rotella and Ward, (eds.) Comprehensive Medicinal Chemistry III vol. 3, pp. 464–487. Oxford: Elsevier. http://dx.doi.org/10.1016/B978-0-12-409547-2.13814-4, ISBN: 9780128032008 https://www.ncbi.nlm.nih.gov/pubmed/26194581
  • 8. Core assumption: can we believe patent SAR results? • We know the data has value but difficult to extrinsically asses quality • As for other domains, Med Chem has an experimental reproducibility crisis • This reflects equivocality w.r.t. antibodies, cell lines and chemistry (e.g. supplier purity and probes vs PAINS) • For patents high-replicate error ranges are rarely included • Re-synthesis fidelity also rarely reported (ever?) • Cf. “Dispensing processes impact apparent biological activity as determined by computational and statistical analyses” (PMID 23658723) • We could hope that internal relative SAR across a series is more consistent than externally comparative absolute numbers • We know some inventor teams are world-class, well cited medicinal chemists but can we assess the less famous? • The same QC considerations apply to papers • ChEMBL surfaces the worryingly wide IC50/Ki/Kd ranges on nominally same assays from different papers • We can also intersect some patent and paper values • Is the internal consistency of patent-derived SAR models a useful QC? 8
  • 9. Introductory example 9 • 138 detailed descriptions of the series • WO2013083991 SureChEMBL- PubChem • IC50 cross-reactivity data from no less than five cell-based enzyme assays • Human NMT1 (P30419), human NMT2 (O60551) Plasmodium vivax (A5K1A2) Plasmodium falciparum (Q8ILW6) and Leishmania donovani (Q8ILW6) • https://cdsouthan.blogspot.se/2013/07/n- myristoyltransferase-patent-and-pdb.html
  • 11. So how much useful SAR is in the patent corpus? • Definition for SAR: Bioactivity assay "A" (e.g. for an enzyme) with a quantitative result "R" (e.g. an IC50) for a compound "C" (defined chemical structure) as an activity modulator (e.g. inhibition) of protein target "P“ (also for cellular targets e.g. antinfectives) • A useful shorthand for this mapping is “D-A-R-C-P” • Excelra (ex GVKBIO) provides good statistical starting point • https://www.slideshare.net/cdsouthan/largescale-curation-of- bioactive-chemistry-from-patents-and-papers • April 2017 numbers were 1.34 mill cpds from 112K papers and 3.35 mill from 71K patents, 0.18 million overlap • From the earlier PMID 24204758, 12 cpds/paper and 46/patent • Human protein targets 3383 in former 2431 in latter, 3882 combined with 546 patent-only • The Excelra absolute activity numbers dependent on their capping rules for binned data (e.g. IC50 between 10 and 100nM) • Binned data still useful for modelling • Where are the enzyme activators? 11
  • 12. Independent estimates of SAR total • WIPO PATENTSCOPE A61 and C07 PCTs = 93,253 • Not all have SAR data from novel composition of matter first-filings • Many will be “secondary” filings (e.g. synthesis and/or crystallisation) • Generic companies file many of these for de-risked cpds • Some first-filings for a chemotype series may not have any activity data disclosed (stats unknown) • We can thus assume that extractable SAR from med chem patents in the last five years may be only 30- 50,000 documents • Guestimate: ~50K patents ~ 3.50 million bioactive structures (c.f. Excelra 3.35 million) • Asian patents under-represented? (i.e. are we missing unique structures & SAR) 12
  • 13. BindingDB public SAR curation: useful benchmark extraction stats • Patents: 1,879 • Binding measurements: 199,588 • Compounds: 132,170 • Target proteins: 1,225 • Assays: 2,668 • Average Number of Targets per Patent: 1.95 • Usually primary plus a specificity paralogue cross-screen • ~70 compounds/patent • ~100 affinity measurements/patent Data courtesy of Tiqing Liu and Michael Gilson, Oct 2017 13
  • 14. Patent chemistry stats inside PubChem 14
  • 15. The three major CNER sources inside PubChem 15 IBM = 10.7 SCRIPDB = 4.0 SureChEMBL = 17.6 2.9 2.4 4.7 10.1 0.6 0.4 0.50 Counts (Oct 2017) are CIDs in millions Union = 21.7 3-way = 2.4 3 + 2-way = 8.1 Unique= 13.5 Raises questions about corroboration vs divergence
  • 16. The chemistry stats: wheat vs chaff • If we except a certain proportion of binned data as useful, the max SAR we could expect to align is ~ 3 to 4 mill strutures • But how can we select these from the 22 million (and climbing) in PubChem? • The easiest way is to come in from the literature with clean structures • This can expand the SAR around a target anywhere from 2 to 10-fold • But have an unknown statistic; what proportion of good patent SAR sets, including for novel targets, never get into a paper? (examples anyone?) • Exelra have some relevant stats on this – does anyone else? 16
  • 18. Sources offer a broad spectrum of utilities • Connecting to patents via structures from papers • Connecting via targets and/or diseases from papers • Proximity “Walking” doc <>doc, target <> target, struc <> struc • Finding patents via metadata (e.g. assignee, target and date) • Viewing chemistry content in the document • Establishing if the document has useful SAR • Finding which sources have extracted chemistry • Mapping the structures to the activity values • DIY extraction of structures not yet in a source (e.g. images and/or IUPAC strings • Collating an SAR table • Best to get familiar with in-depth functionality of a few sources • Many roads lead to Rome so difficult to know which is most efficient • I certainly have not tried all those of probable utility 18
  • 19. Source : BindingDB target-mapped SAR extraction 19
  • 20. BindingDB • Pre-cooked expert curation • Modest but steady growth • Easy to browse list • Structures > PubChem and subsumed > ChEMBL • Targets mapped to UniProt even for titles with no target • Many search features, some unique (i.e. different to ChEMBL) • Novel targets from patents and unique journal selection • Download full SAR sets example no. > structure > activity > target • Lag time in PubChem indexing • No antinfective whole organism targets • US publications some years behind the WO first pubs • Dependent on CWU structures that are not all correct 20
  • 22. WIP0 PATENTSCOPE • Comprehensive and up to date • Instant metrics (yellow highlight) as you toggle search parameters • Sign in for saved searches • Useful instant graphics on result lists • Search reports can “walk” you to other relevant filings • Pithy examiner comments (almost) amusing • Limited text search fields • In-line table images, pros and cons • Slow image loading • Inventor/applicant conflation 22
  • 23. The WIPO “gift horse” 23 • ~ 7 million strucs, WO and US from 1978 • WIPO collab w. InfoChem and NextMove • False-negatives (i.e. examples missed) • Not yet in PubChem • Limited utility for SAR mining so far
  • 24. Source: EPO Espace 24 I prefer WIPO as a search portal but Espace is useful for INPADOC families
  • 26. SureChEMBL • For SAR extraction the best first-stop-shop (after BindingDB) • Chemistry indexed a week or less from publication date • Family-wide structure downloads • Powerful combination of filters and search functionality • Multiple source x-refs including PubChem and ChEMBL • Can correct IUPAC failures and paste out example blocks • Usual caveats of CNER (but hey, 18 mill structures for free) • Extraction confounded by dense image tables • WIPOs less well extracted that USPTOs (but OCR not their fault) • Overhead of futile common chemistry extraction • Slow image load times and structure step-through • Need to watch PubChem load dates(via SIDs) • The feature that never appeared :( 26
  • 28. PubChem • Mother of all searchable portals with 22 mill patent compounds • SureChEMBL, ChEMBL, BindingDB and IBM are in it • Massive feature set including Entrez • Patent and PubMed connectivity via structure • Very useful Identifier Exchange Service for set mapping • Can upload SD files (e.g. from Chemicalize or SciFinder) • Transparent and navigable chemistry rules (e.g. “same connectivity”) • Slice ‘n dice full Boolean search history • Extensive filter options • Direct Venn from CID lists < 10,000 • Similar compound clustering > isolate an SAR series • Can “walk” though chemical neighbourhood > cluster > cluster patent hop by chemotype (target neutral) • Navigation can be daunting • Some large sources should be kicked out IMHO • Interface queries often time out 28
  • 29. New search interface includes patents 29
  • 31. ChEMBL • Gateway to chemistry manually extracted from journals • 0.39 mill structure mapped across to SureChEMBL • This gives direct journal < > patent connectivity • Powerful query, filtration, browsing and target indexing • Release 23 has1.02 mill structures and assay data from 67,722 papers • Circular subsumation of 0.5 mill structures from confirmed PubChem Bioassays • Integrates the BindingDB patent curation (but sync lag) • Indexed in PubChem BioAssay • Target-linked entries subsumed into BindingDB • Linked to EPMC • Good for paper <> patent • Not linked to PubMed • Up to 2 year lag for papers • Selective journal capture 31 0.39 mill ChEMBL 1.34 mill SureChEMBL 17.23 mill
  • 32. Source: Europe PubMed Central 32
  • 33. Europe PubMed Central • Fully featured literature search functionality • Big plus is the (HAS_CHEMBL:y) select for chemistry • Gives query > paper > ChEMBL chemistry > SureChEMBL and/or > PubChem • Bioentity mark up from other sources • De facto two-stop shop with PubMed which has different functionality • Warning, their patent abstracts not updated since 2012 33
  • 35. PubMed • Largest entry point to connect Med Chem papers < > patents • Entities disclosed, ie target protein IDs , affiliations, chemical structure • Power of Entrez, including MeSH • PubMed > PDB (via MMD) good for CID of ligands > patent • Can connect inventors with unusual names • However, papers typically ~ 2 years behind “fresh” patents • May find enough SAR for popular targets not to bother with patents • But (unless paper in ChEMBL) you may have to DIY extract entities including chemistry • Patents good at citing papers (US mandated to be thorough) • However, many authors avoid citing their patents • Connect into literature via targets and diseases and thence > patents • JFTR disease searching in patent text largely useless (titles maybe) • Patent reviews valuable but tend to be in hard-to-get journals 35
  • 36. Patent review articles: doing the groundwork for you 36
  • 38. PubMed > PubChem > Guide to Pharmacology BACE2 page 38
  • 39. Source: Open Google 39 Date cutting e.g. by one year, actually works
  • 42. Status of human targets from open sources (as UniProt x-refs) 42 Oct 2016 Oct 2017 • Most of have chemistry > target via papers (thus can search patents) • Outer limit of data-supported druggable proteome • Some patent only in BindingDB
  • 43. Patent retrieval by target names: not so easy 43
  • 44. Patent retrieval by target names 44 In : Lecture Notes in Bioinformatics (ISBN 978-3-642-15119-4) P Lambrix and G Kemp (Eds.) Springer Verlag, pp 106-121, 2010
  • 45. Classification of target names in titles 45 AWK Gene and protein names can be noisy and inconsistently used by applicants but HGN approved symbol usage seems to be improving
  • 46. The three levels of title 46
  • 48. Utility of tools • Can re-run IUPACs and images where automated conversion failed • Synergies of gap filling from working between the original document, the SureChEMBL output and the OPSIN and OSRA tools • Can run on PubMed abstracts, individually or bulk • Can isolate example series of structures that has the SAR • Useful for extraction from papers not in ChEMBL • May be necessary to convert between formats e.g. for uploading to PubChem 48
  • 50. Venny • Excellent for set comparisons of any strings < over 10,000 • E.g. CIDs, InChIKeys or UniProt IDs • It automatically de-duplicates • Download complete intersects and diffs from any segment of the Venn 50
  • 53. OpenBabel 53 Format conversions e.g. SciFinder SDF to InChIKey
  • 54. Example of coverage from US9181236 54 • 173 BindingDB CIDs curated from PubChem via US9181236 • 405 substances SDF from SciFinder OpenBabel > 391 IK > 362 CIDs • 1657 rows > 834 SureChEMBL IDs > 664 CIDs • 3-way Venn of CIDs
  • 56. Chemicalize Google patent webpage result 56
  • 57. OPSIN for IUPAC names 57 • Conversion of compound 19 from WO2016096979 after fixing OCR errors • N-r3-r(26',3i?)-5-amino-2-methyl-3-(trifluoromethyl)-3,4- dihvdropyrrol-2-yl1- 4-fluoro-phenyl1-5-chloro-pyridine-2-carboxamide. • Good for iterative correction via error flagging that Chemicalize will not
  • 59. Getting SAR out the hard way 59
  • 60. Collating the hard way • Three versions of the SAR table from WO2016096979 • On the left is the original from page 64 of the PDF • In the centre is the corresponding section of the SureChEMBL mark-up • The right hand panel is an Excel paste-across of the centre section • But you have to complete by pasting SMILES of structures on previous page 60
  • 61. Getting SAR out the easy, via BindingDB 61
  • 63. Wish list Yup, we can dig a lot of SAR out of patents But wouldn’t it be nice if….. • Clavariate re-instated the Derwent patent chemistry feed to PubChem • Open standard SAR modelling tools (with AI natch’) maybe Knime? (table in > model out) • These might show large patent SAR sets better than from papers • Someone indexed full text patents by gene name counts inside the description section (SureChEMBL for OpenPhacts?) • SureChEMBL would finally bring in their document section stats • Run the SureChEMBL engine on full-text papers and PubMed abstracts • European PubMed Central updated their EPO C07/A61 patent abstracts from 2012 • We could paste large text chunks > Chemicalize but not run out of points • Patents could be more like good papers…. 63
  • 64. Could the future be automatic? 64 https://www.slideshare.net/NextMoveSoftware