The document discusses the issue of quality and accuracy of chemical structure information online. It presents ChemSpider, a platform for chemical structure data deposition, curation and linking to other online sources. ChemSpider aims to address the problem of incorrect and inconsistent chemical structure data across different websites through features like chemical structure markup, validation using InChI identifiers, and collaborative curation efforts. The goal is to build a single point of reference for reliable chemical structure information on the internet.
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
1. Crowdsourcing, Collaborations andCrowdsourcing, Collaborations and
Text-Mining in a World of OpenText-Mining in a World of Open
ChemistryChemistry
Antony WilliamsAntony Williams
Bio-IT World 2009Bio-IT World 2009
2. Building a Structure Centric Community
for Chemists
Linked Data CloudLinked Data Cloud
3. Building a Structure Centric Community
for Chemists
Chemistry on the InternetChemistry on the Internet
Much of the information online isMuch of the information online is User Beware!User Beware!
The Quality of information is “diverse”The Quality of information is “diverse”
Technologies can “link and connect” information butTechnologies can “link and connect” information but
validation and curation is key to providing qualityvalidation and curation is key to providing quality
The LinkedData web is of less value when the data linkedThe LinkedData web is of less value when the data linked
are “wrong”are “wrong”
4. Building a Structure Centric Community
for Chemists
Quality CostsQuality Costs
Chemical Abstracts ServiceChemical Abstracts Service (CAS), a division of the(CAS), a division of the
ACS is “Gold Standard” in Chemistry relatedACS is “Gold Standard” in Chemistry related
informationinformation
101 years of content, $260 million revenue (2006), >40101 years of content, $260 million revenue (2006), >40
million substances and 60 million sequencesmillion substances and 60 million sequences
But online…But online…
5. Building a Structure Centric Community
for Chemists
What is “wrong”?What is “wrong”?
6. Building a Structure Centric Community
for Chemists
A platform for:A platform for:
Data deposition,Data deposition, curation and annotationcuration and annotation
Supporting Open Notebook Science effortsSupporting Open Notebook Science efforts
Chemistry document mark-up with ChemMantisChemistry document mark-up with ChemMantis
The Open Access ChemSpider Journal of ChemistryThe Open Access ChemSpider Journal of Chemistry
7. Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
8. Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
9. Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
10. Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
11. Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
12. Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
13. Building a Structure Centric Community
for Chemists
Complex Data and InformationComplex Data and Information
14. Building a Structure Centric Community
for Chemists
Online DataOnline Data
Many websites host structure-based informationMany websites host structure-based information
Question quality!!!Question quality!!!
16. Building a Structure Centric Community
for Chemists
Wikipedia, C&E News, PubChemWikipedia, C&E News, PubChem
C&E News (from ACS)C&E News (from ACS)
17. Building a Structure Centric Community
for Chemists
Does one stereocenter matter?Does one stereocenter matter?
18. Building a Structure Centric Community
for Chemists
VancomycinVancomycin
Who will curate?Who will curate?
PubChem is notPubChem is not
resourced to cleanresourced to clean
these errorsthese errors
How would youHow would you
clean such a largeclean such a large
dataset?dataset?
19. Building a Structure Centric Community
for Chemists
VancomycinVancomycin
ChemSpider: 1 compound – 3 daysChemSpider: 1 compound – 3 days
20. Building a Structure Centric Community
for Chemists
Question EverythingQuestion Everything
www.dhmo.orgwww.dhmo.org
21. Building a Structure Centric Community
for Chemists
DailyMedDailyMed
““DailyMed providesDailyMed provides high qualityhigh quality information aboutinformation about
marketed drugs.marketed drugs.
This information includes FDA approved labelsThis information includes FDA approved labels
(package inserts).”(package inserts).”
22. Building a Structure Centric Community
for Chemists
The FDA’s DailyMedThe FDA’s DailyMed
23. Building a Structure Centric Community
for Chemists
Structures on DailyMedStructures on DailyMed
Poor RepresentationsPoor Representations
24. Building a Structure Centric Community
for Chemists
Structures on DailyMedStructures on DailyMed
Lack of StereochemistyLack of Stereochemisty
25. Building a Structure Centric Community
for Chemists
Incorrect StructuresIncorrect Structures
Scanning (?) IssuesScanning (?) Issues
26. Building a Structure Centric Community
for Chemists
Incorrect StructuresIncorrect Structures
27. Building a Structure Centric Community
for Chemists
Does it Matter?Does it Matter?
Does it matter to the consumer that the structures areDoes it matter to the consumer that the structures are
wrong? No…what matters is what is in the bottle is thewrong? No…what matters is what is in the bottle is the
right medication!right medication!
To make DailyMed structure searchable it DOESTo make DailyMed structure searchable it DOES
mattermatter
To data mine DailyMed it mattersTo data mine DailyMed it matters
To mark up DailyMed it mattersTo mark up DailyMed it matters
28. Building a Structure Centric Community
for Chemists
CollaborativeCollaborative Knowledge ManagementKnowledge Management
for Chemistsfor Chemists
29. Building a Structure Centric Community
for Chemists
Wikipedia Links to DrugbankWikipedia Links to Drugbank
30. Building a Structure Centric Community
for Chemists
Taxol on PubChemTaxol on PubChem
31. Building a Structure Centric Community
for Chemists
Taxol on Daily MedTaxol on Daily Med
32. Building a Structure Centric Community
for Chemists
The InChI IdentifierThe InChI Identifier
33. Building a Structure Centric Community
for Chemists
Multiple LayersMultiple Layers
Source: Unofficial InChI FAQ pageSource: Unofficial InChI FAQ page
34. Building a Structure Centric Community
for Chemists
InChIStrings Hash to InChIKeysInChIStrings Hash to InChIKeys
35. Building a Structure Centric Community
for Chemists
InChIs for TaxolInChIs for Taxol
36. Building a Structure Centric Community
for Chemists
Back to TaxolBack to Taxol
DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD
ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN
Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ
Which one is correct???
37. Building a Structure Centric Community
for Chemists
InChIKeys for TaxolInChIKeys for Taxol
DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD
ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN
Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ
ChEBI and Wikipedia are the SAME structure
Drugbank is a DIFFERENT structure – ONE
stereocenter
38. Building a Structure Centric Community
for Chemists
The InChI ResolverThe InChI Resolver
40. Building a Structure Centric Community
for Chemists
Coming Soon…Linked ArticlesComing Soon…Linked Articles
41. Building a Structure Centric Community
for Chemists
How bad can it get???How bad can it get???
And who is right????And who is right????
42. Building a Structure Centric Community
for Chemists
ChemMantisChemMantis
ChemChemicalical MMarkuparkup AAndnd NNomenclatureomenclature TTransformationransformation
IIntegratedntegrated SSystem –ystem – ChemMantisChemMantis
A platform for entity extraction for chemistryA platform for entity extraction for chemistry
documents, markup and integration to onlinedocuments, markup and integration to online
information sources – Wikipedia, ChemSpider, Entrez…information sources – Wikipedia, ChemSpider, Entrez…
Web-based submission, markup and publishing platformWeb-based submission, markup and publishing platform
now hosting thenow hosting the ChemSpider Journal of ChemistryChemSpider Journal of Chemistry
43. Building a Structure Centric Community
for Chemists
ChemMantis MarkupChemMantis Markup
44. Building a Structure Centric Community
for Chemists
Enable Electronic Articles…Enable Electronic Articles…
Structures are theStructures are the
language of chemistrylanguage of chemistry
Show structures toShow structures to
chemists and search/linkchemists and search/link
from there…from there…
45. Building a Structure Centric Community
for Chemists
Species MarkupSpecies Markup
46. Building a Structure Centric Community
for Chemists
Dictionaries are Easily EnhancedDictionaries are Easily Enhanced
Copy-Paste into appropriate Entity DictionaryCopy-Paste into appropriate Entity Dictionary
Impacts all future markupsImpacts all future markups
Expanding knowledgebases of informationExpanding knowledgebases of information
Linked out to rich sources of informationLinked out to rich sources of information
47. Building a Structure Centric Community
for Chemists
Build DictionariesBuild Dictionaries
Ontologies NextOntologies Next
49. Building a Structure Centric Community
for Chemists
Publishers and Document Mark-UpPublishers and Document Mark-Up
50. Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Linked from WikipediaLinked from Wikipedia
Linked from Open Notebook Science sites using EMBEDLinked from Open Notebook Science sites using EMBED
Linked from Blogs using Structure/Spectra EMBEDLinked from Blogs using Structure/Spectra EMBED
Integrated into structure drawing packages such asIntegrated into structure drawing packages such as
ACD/ChemSketch, Symyx Draw, Open Source appletsACD/ChemSketch, Symyx Draw, Open Source applets
Integrated to software offerings from Thermo, Waters, Agilent,Integrated to software offerings from Thermo, Waters, Agilent,
BrukerBruker
51. Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Embed Functionality (like YouTube)Embed Functionality (like YouTube)
52. Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
www.spectralgame.comwww.spectralgame.com
53. Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Crowdsourced Curation of SpectraCrowdsourced Curation of Spectra
54. Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
RSC CompoundsRSC Compounds
55. Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Nature ChemistryNature Chemistry
Nature ChemistryNature Chemistry articles arearticles are
annotated to identify all of theannotated to identify all of the
chemical compounds mentionedchemical compounds mentioned
throughout the text.throughout the text.
Those compounds are linked out toThose compounds are linked out to
other information resourcesother information resources
including PubChem andincluding PubChem and
ChemSpiderChemSpider..
56. Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
ChemMobiChemMobi
57. Building a Structure Centric Community
for Chemists
Structure RSS Feeds with InChIsStructure RSS Feeds with InChIs
59. Building a Structure Centric Community
for Chemists
AcknowledgmentsAcknowledgments
Richard Kidd, Royal Society of ChemistryRichard Kidd, Royal Society of Chemistry
Jason Wilde, Nature Publishing GroupJason Wilde, Nature Publishing Group
Martin Walker and the Wikipedia Chemistry teamMartin Walker and the Wikipedia Chemistry team
Microsoft – Rudy PotenzoneMicrosoft – Rudy Potenzone
Symyx – Keith Taylor and James JackSymyx – Keith Taylor and James Jack
SureChem – Nicko GoncharoffSureChem – Nicko Goncharoff
Spectral game - Andrew Lang and Jean-Claude BradleySpectral game - Andrew Lang and Jean-Claude Bradley
““The InChI team and Advisory Group”The InChI team and Advisory Group”
60. Building a Structure Centric Community
for Chemists
ConclusionsConclusions
www.chemspider.comwww.chemspider.com
www.chemspider.com/journalwww.chemspider.com/journal
InChIs and Internet ChemistryInChIs and Internet Chemistry
http://inchis.chemspider.comhttp://inchis.chemspider.com