Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)
1. Integrating patent chemistry with
public and private non-patent
research resources
Nicko Goncharoff ACS Fall 2012
Andrew Hinton, PhD 19 August
Christopher Southan, PhD
2.
3.
4. SureChem Data Collection!
Database of automatically mined structure data
from text and images!
!
• 20M annotated US, EP, WO full text records
and Japan patent abstracts!
I!
• 12M unique chemical structures!
• MEDLINE – 19M abstracts (coming Q4)!
5. ª Free resource for researchers! ª Professional search needs!
ª Enables linking to public and ª Data export, alerts, patent family
proprietary content search, chemical relevance filters…!
ª API or Data Feed access to
chemistry & full text!
ª Integrate with internal
databases & workflows
8. Current Patent Sources In PubChem!
4000000 3.7 M
3500000
3000000
Numbers of SID's
2.3 M
2500000
2000000
1500000
1000000
500000 280 K
10 K
0
EPO(Sling) Chemicalize.org IBM Thomson
Thompson
Pharma
9. Patent & Literature Sources in
PubChem !
The
Big
Three
Thomson Pharma,! ChEMBL + !
patents and literature ! PubMed + Journals!
3,756,283! 918,077!
41% lead-like! 45% lead-like!
3,291,940
281,920
515,745
52,975
129,448
67,437
2,113,169
IBM,
pre-‐2000
patents
2,369,481
32%
lead-‐like
10. SureChem to Deposit All Structures*
into PubChem - 2012!
• 1976 to present
• Deposition of structures only
• View related patents in SureChemOpen
• *Some filtering of common chemistry likely
11. SureChem and IBM in PubChem
(2 Example Patents)!
SureChem Total: 776! IBM Total : 527!
US583593, Inhibitors of squalene
synthetase and protein
farnesyltransferase. Abbott !
478
298
229
SureChem Total: 832 ! IBM Total: 239!
686
146
93
WO-1994018188-A1 !
4-hydroxy-benzopyran-2-ones and 4-
hydroxy-cycloalkyl[b]pyran-2-ones
HIV protease inhibitors, Upjohn!
17. SureChem Chemical Relevance Filtering!
• Frequency
counts
of
chemicals
within
patents
• AddiHonal
molecular
property
filtering
i.e.
Lipinski
descriptors
!
• Natural
Language
Processing
–
based
indexing
of
Exemplified
Compounds
!
! Automated indexing of Exemplified Compounds in text!
18. Conclusion!
SureChem deposition into PubChem will
– Significantly expand public patent chemistry scope
– Contribute unique and timely MedChem-relevant data
– Enable open drug discovery and chemical biology
– Advance progress toward a more open, federated
chemical information network