SlideShare una empresa de Scribd logo
1 de 17
[1]
Chemicalize.org, SureChemOpen, PubChem and
the InChIKey: A heavenly conjunction with
transformative utility
Christopher Southan, TW2Informatics, Göteborg, Sweden,
ChemAxon UGM, Budapest, May 2013
Image credit: http://www.eso.org/public/images/yb_vlt_moon_cnn_cc/
[2]
Dr Christopher Southan, Ph.D., M.Sc.,B.Sc.
TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htm
Mobile: +46(0)702-530710
Skype: cdsouthan
Email: cdsouthan@hotmail.com
Twitter: http://twitter.com/#!/cdsouthan
Blog: http://cdsouthan.blogspot.com/
LinkedIN: http://www.linkedin.com/in/cdsouthan
Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publications
Presentations: http://www.slideshare.net/cdsouthan
[3]
The ChemAxon name-to-struc functionality is not only a component of the SureChem
patent extraction pipeline but also powers chemicalize.org. Both operations are now
submitting sources to PubChem. The former has deposited structures that bring the
patent-extracted total in PC to 14.5 mill. CIDs. The deposition from chemicalize is
~0.3 mill., but has been actively selected by users and is 20% unique. The final
conjunction is that all three sources generate the InChIKey (IK) that turns Google into
a de-facto merge of PubChem and ChemSpider of ~50 mill. structures.
Chemicalize.org users can convert new patents, other external or internal documents
and web based text. Individual results can be Googled, searched against
SurChemOpen and bulk extractions triaged against PubChem. It thus becomes
possible to connect chemistry between patents, papers, abstracts and database
records via exact match or similarity searching. When SureChem and
chemicalize.org update their submissions, relationships with the other ~200 PubChem
sources (including ChEMBL and vendor databases) are re-computed and new CID
links made. The synergy between SureChem and chemicalize.org is powerful because
matches between them (~ 0.15 mill.) via SureChemOpen, give occurrence statistics
and the location of the structure within patents. The applications of chemicalize.org
are extended by web tools such as Venny for determining intersects from multiple
extractions and CheS-Mapper for cluster visualization. These utility expansions will be
illustrated by documents specifying BACE1 inhibitors for Alzheimer’s disease.
Abstract
[4]
Auspicious Conjunctions 2012-13
• PubChem: global chemistry to slice ‘n dice
• SureChemOpen: majority of patent chemistry opened up
• Chemicalize.org : chemistry extractable from any text toombs
• Chemical images: patents extracted in SureChemOpen, OSRA
handles papers
• InChIKey indexing in Google
• ChemSpider: crowdsourcing chemisty quality
• Exapnding toolbox e.g.OPSIN, Venny, Ches-mapper
• SciBite alerts
• Expanding preview and surfacing options e.g. ChEMBLntd, Github,
OSDD, Open Lab Books, figshare etc
• Rise of mobile chemistry
[5]
Databases <> structures < > documents
Abstracts
Patents
Papers
15 mill
0.2 mill (MeSH)
0.8 mill
(ChEMBL)
12K
Google InChIKey ~ 50 million
(47m PubChem + 33m
UniChem + 28m ChemSpider)
[6]
Triaging chemistry from text
• Identify the structure specification types, e.g.
– Semantic names (all sources)
– Code names (press releases, papers and abstracts)
– IUPAC names (papers, patents and abstracts)
– Images (papers, patents, & Google images)
– SMILES (open lab books)
– InChi strings (open lab books)
– SDF files (open lab books, & github)
Convert these to a structure (e.g. SDF, SMILES, InChI) then:
– Search InChIKey in Google
– Search major databases
– Search SureChemOpen
– Compare extracted sets for intersects and diffs
– Extend exact match connectivity with similarity searching
[7]
PubChem Composition
[8]
SureChemOpen Composition (in PubChem)
[9]
Chemicalize.org Composition (in PubChem)
[10]
BACE2 Conjunctions
[11]
BACE2 Conjunctions
[12]
Chemicalise.org Triage
[13]
BACE2 Conjunctions
1. WO2013054291 > chemicalize.org
2. Download 450 structures
3. Upload to PubChem search
[14]
Clustering document extraction sets: CheS-Mapper
[15]
Venny: intersects, diffs, de-dupes and merges
[16]
Conclusions
• Transformative opening up of chemistry > biology via structure >document
connectivity
• Open mining of patent metadata and data
• Expanding toolbox
• Inexorable expansion of open-access publishing
But;
• Journal chemistry extraction > database records still slow
• Text mining of journals still restricted
• Author annotation and direct db submission rare
• Pharmaceutical research publications are still blinding structures (see
PMID: 23159359)
[17]
References
http://www.slideshare.net/cdsouthan/the-patent-chemistry-big-bang-in-pubchem
http://www.slideshare.net/cdsouthan/cs-cax-bioitchemicalizeposter03apr
http://www.ncbi.nlm.nih.gov/pubmed/23399051
http://www.ncbi.nlm.nih.gov/pubmed/23618056
http://www.ncbi.nlm.nih.gov/pubmed/23506624

Más contenido relacionado

Destacado

EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...ChemAxon
 
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...ChemAxon
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...ChemAxon
 
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...ChemAxon
 
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...ChemAxon
 
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...ChemAxon
 
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...ChemAxon
 
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...ChemAxon
 
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismEUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismChemAxon
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...ChemAxon
 
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...ChemAxon
 
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists ChemAxon
 
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignEUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignChemAxon
 
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsEUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsChemAxon
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horseChris Southan
 
Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014Chris Southan
 

Destacado (17)

EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
 
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
 
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
 
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
 
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
 
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
 
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
 
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismEUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
 
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
 
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
 
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignEUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
 
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsEUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
 
biologydriven
biologydrivenbiologydriven
biologydriven
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014
 

Similar a EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility

A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Chris Southan
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humanspetermurrayrust
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and HumansTheContentMine
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Librariespetermurrayrust
 
Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases Chris Southan
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic BiologyTheContentMine
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDatapetermurrayrust
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDataTheContentMine
 
Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11Vivien Rolfe
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyCornelius Puschmann
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsScott Edmunds
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulTheContentMine
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulpetermurrayrust
 
Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)Anne Arendt
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsopen_phacts
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 

Similar a EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility (20)

A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott Edmunds
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow tools
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 

Más de ChemAxon

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?ChemAxon
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemAxon
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive modelsChemAxon
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...ChemAxon
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data managementChemAxon
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseChemAxon
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation ChemAxon
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...ChemAxon
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryChemAxon
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...ChemAxon
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloudChemAxon
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationChemAxon
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction ChemAxon
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...ChemAxon
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology ChemAxon
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem MicroservicesChemAxon
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choralChemAxon
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5ChemAxon
 

Más de ChemAxon (20)

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility

  • 1. [1] Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility Christopher Southan, TW2Informatics, Göteborg, Sweden, ChemAxon UGM, Budapest, May 2013 Image credit: http://www.eso.org/public/images/yb_vlt_moon_cnn_cc/
  • 2. [2] Dr Christopher Southan, Ph.D., M.Sc.,B.Sc. TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htm Mobile: +46(0)702-530710 Skype: cdsouthan Email: cdsouthan@hotmail.com Twitter: http://twitter.com/#!/cdsouthan Blog: http://cdsouthan.blogspot.com/ LinkedIN: http://www.linkedin.com/in/cdsouthan Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publications Presentations: http://www.slideshare.net/cdsouthan
  • 3. [3] The ChemAxon name-to-struc functionality is not only a component of the SureChem patent extraction pipeline but also powers chemicalize.org. Both operations are now submitting sources to PubChem. The former has deposited structures that bring the patent-extracted total in PC to 14.5 mill. CIDs. The deposition from chemicalize is ~0.3 mill., but has been actively selected by users and is 20% unique. The final conjunction is that all three sources generate the InChIKey (IK) that turns Google into a de-facto merge of PubChem and ChemSpider of ~50 mill. structures. Chemicalize.org users can convert new patents, other external or internal documents and web based text. Individual results can be Googled, searched against SurChemOpen and bulk extractions triaged against PubChem. It thus becomes possible to connect chemistry between patents, papers, abstracts and database records via exact match or similarity searching. When SureChem and chemicalize.org update their submissions, relationships with the other ~200 PubChem sources (including ChEMBL and vendor databases) are re-computed and new CID links made. The synergy between SureChem and chemicalize.org is powerful because matches between them (~ 0.15 mill.) via SureChemOpen, give occurrence statistics and the location of the structure within patents. The applications of chemicalize.org are extended by web tools such as Venny for determining intersects from multiple extractions and CheS-Mapper for cluster visualization. These utility expansions will be illustrated by documents specifying BACE1 inhibitors for Alzheimer’s disease. Abstract
  • 4. [4] Auspicious Conjunctions 2012-13 • PubChem: global chemistry to slice ‘n dice • SureChemOpen: majority of patent chemistry opened up • Chemicalize.org : chemistry extractable from any text toombs • Chemical images: patents extracted in SureChemOpen, OSRA handles papers • InChIKey indexing in Google • ChemSpider: crowdsourcing chemisty quality • Exapnding toolbox e.g.OPSIN, Venny, Ches-mapper • SciBite alerts • Expanding preview and surfacing options e.g. ChEMBLntd, Github, OSDD, Open Lab Books, figshare etc • Rise of mobile chemistry
  • 5. [5] Databases <> structures < > documents Abstracts Patents Papers 15 mill 0.2 mill (MeSH) 0.8 mill (ChEMBL) 12K Google InChIKey ~ 50 million (47m PubChem + 33m UniChem + 28m ChemSpider)
  • 6. [6] Triaging chemistry from text • Identify the structure specification types, e.g. – Semantic names (all sources) – Code names (press releases, papers and abstracts) – IUPAC names (papers, patents and abstracts) – Images (papers, patents, & Google images) – SMILES (open lab books) – InChi strings (open lab books) – SDF files (open lab books, & github) Convert these to a structure (e.g. SDF, SMILES, InChI) then: – Search InChIKey in Google – Search major databases – Search SureChemOpen – Compare extracted sets for intersects and diffs – Extend exact match connectivity with similarity searching
  • 13. [13] BACE2 Conjunctions 1. WO2013054291 > chemicalize.org 2. Download 450 structures 3. Upload to PubChem search
  • 15. [15] Venny: intersects, diffs, de-dupes and merges
  • 16. [16] Conclusions • Transformative opening up of chemistry > biology via structure >document connectivity • Open mining of patent metadata and data • Expanding toolbox • Inexorable expansion of open-access publishing But; • Journal chemistry extraction > database records still slow • Text mining of journals still restricted • Author annotation and direct db submission rare • Pharmaceutical research publications are still blinding structures (see PMID: 23159359)