SlideShare una empresa de Scribd logo
1 de 60
Crowdsourcing, Collaborations andCrowdsourcing, Collaborations and
Text-Mining in a World of OpenText-Mining in a World of Open
ChemistryChemistry
Antony WilliamsAntony Williams
Bio-IT World 2009Bio-IT World 2009
Building a Structure Centric Community
for Chemists
Linked Data CloudLinked Data Cloud
Building a Structure Centric Community
for Chemists
Chemistry on the InternetChemistry on the Internet
 Much of the information online isMuch of the information online is User Beware!User Beware!
 The Quality of information is “diverse”The Quality of information is “diverse”
 Technologies can “link and connect” information butTechnologies can “link and connect” information but
validation and curation is key to providing qualityvalidation and curation is key to providing quality
 The LinkedData web is of less value when the data linkedThe LinkedData web is of less value when the data linked
are “wrong”are “wrong”
Building a Structure Centric Community
for Chemists
Quality CostsQuality Costs
 Chemical Abstracts ServiceChemical Abstracts Service (CAS), a division of the(CAS), a division of the
ACS is “Gold Standard” in Chemistry relatedACS is “Gold Standard” in Chemistry related
informationinformation
 101 years of content, $260 million revenue (2006), >40101 years of content, $260 million revenue (2006), >40
million substances and 60 million sequencesmillion substances and 60 million sequences
 But online…But online…
Building a Structure Centric Community
for Chemists
What is “wrong”?What is “wrong”?
Building a Structure Centric Community
for Chemists
 A platform for:A platform for:
 Data deposition,Data deposition, curation and annotationcuration and annotation
 Supporting Open Notebook Science effortsSupporting Open Notebook Science efforts
 Chemistry document mark-up with ChemMantisChemistry document mark-up with ChemMantis
 The Open Access ChemSpider Journal of ChemistryThe Open Access ChemSpider Journal of Chemistry
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Complex Data and InformationComplex Data and Information
Building a Structure Centric Community
for Chemists
Online DataOnline Data
 Many websites host structure-based informationMany websites host structure-based information
 Question quality!!!Question quality!!!
Building a Structure Centric Community
for Chemists
Building a Structure Centric Community
for Chemists
Wikipedia, C&E News, PubChemWikipedia, C&E News, PubChem
C&E News (from ACS)C&E News (from ACS)
Building a Structure Centric Community
for Chemists
Does one stereocenter matter?Does one stereocenter matter?
Building a Structure Centric Community
for Chemists
VancomycinVancomycin
 Who will curate?Who will curate?
 PubChem is notPubChem is not
resourced to cleanresourced to clean
these errorsthese errors 
 How would youHow would you
clean such a largeclean such a large
dataset?dataset?
Building a Structure Centric Community
for Chemists
VancomycinVancomycin
ChemSpider: 1 compound – 3 daysChemSpider: 1 compound – 3 days
Building a Structure Centric Community
for Chemists
Question EverythingQuestion Everything
www.dhmo.orgwww.dhmo.org
Building a Structure Centric Community
for Chemists
DailyMedDailyMed
““DailyMed providesDailyMed provides high qualityhigh quality information aboutinformation about
marketed drugs.marketed drugs.
This information includes FDA approved labelsThis information includes FDA approved labels
(package inserts).”(package inserts).”
Building a Structure Centric Community
for Chemists
The FDA’s DailyMedThe FDA’s DailyMed
Building a Structure Centric Community
for Chemists
Structures on DailyMedStructures on DailyMed
Poor RepresentationsPoor Representations
Building a Structure Centric Community
for Chemists
Structures on DailyMedStructures on DailyMed
Lack of StereochemistyLack of Stereochemisty
Building a Structure Centric Community
for Chemists
Incorrect StructuresIncorrect Structures
Scanning (?) IssuesScanning (?) Issues
Building a Structure Centric Community
for Chemists
Incorrect StructuresIncorrect Structures
Building a Structure Centric Community
for Chemists
Does it Matter?Does it Matter?
 Does it matter to the consumer that the structures areDoes it matter to the consumer that the structures are
wrong? No…what matters is what is in the bottle is thewrong? No…what matters is what is in the bottle is the
right medication!right medication!
 To make DailyMed structure searchable it DOESTo make DailyMed structure searchable it DOES
mattermatter
 To data mine DailyMed it mattersTo data mine DailyMed it matters
 To mark up DailyMed it mattersTo mark up DailyMed it matters
Building a Structure Centric Community
for Chemists
CollaborativeCollaborative Knowledge ManagementKnowledge Management
for Chemistsfor Chemists
Building a Structure Centric Community
for Chemists
Wikipedia Links to DrugbankWikipedia Links to Drugbank
Building a Structure Centric Community
for Chemists
Taxol on PubChemTaxol on PubChem
Building a Structure Centric Community
for Chemists
Taxol on Daily MedTaxol on Daily Med
Building a Structure Centric Community
for Chemists
The InChI IdentifierThe InChI Identifier
Building a Structure Centric Community
for Chemists
Multiple LayersMultiple Layers
Source: Unofficial InChI FAQ pageSource: Unofficial InChI FAQ page
Building a Structure Centric Community
for Chemists
InChIStrings Hash to InChIKeysInChIStrings Hash to InChIKeys
Building a Structure Centric Community
for Chemists
InChIs for TaxolInChIs for Taxol
Building a Structure Centric Community
for Chemists
Back to TaxolBack to Taxol
 DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD
 ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN
 Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ
 Which one is correct???
Building a Structure Centric Community
for Chemists
InChIKeys for TaxolInChIKeys for Taxol
 DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD
 ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN
 Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ
 ChEBI and Wikipedia are the SAME structure
 Drugbank is a DIFFERENT structure – ONE
stereocenter
Building a Structure Centric Community
for Chemists
The InChI ResolverThe InChI Resolver
Building a Structure Centric Community
for Chemists
Building a Structure Centric Community
for Chemists
Coming Soon…Linked ArticlesComing Soon…Linked Articles
Building a Structure Centric Community
for Chemists
How bad can it get???How bad can it get???
And who is right????And who is right????
Building a Structure Centric Community
for Chemists
ChemMantisChemMantis
 ChemChemicalical MMarkuparkup AAndnd NNomenclatureomenclature TTransformationransformation
IIntegratedntegrated SSystem –ystem – ChemMantisChemMantis
 A platform for entity extraction for chemistryA platform for entity extraction for chemistry
documents, markup and integration to onlinedocuments, markup and integration to online
information sources – Wikipedia, ChemSpider, Entrez…information sources – Wikipedia, ChemSpider, Entrez…
 Web-based submission, markup and publishing platformWeb-based submission, markup and publishing platform
now hosting thenow hosting the ChemSpider Journal of ChemistryChemSpider Journal of Chemistry
Building a Structure Centric Community
for Chemists
ChemMantis MarkupChemMantis Markup
Building a Structure Centric Community
for Chemists
Enable Electronic Articles…Enable Electronic Articles…
 Structures are theStructures are the
language of chemistrylanguage of chemistry
 Show structures toShow structures to
chemists and search/linkchemists and search/link
from there…from there…
Building a Structure Centric Community
for Chemists
Species MarkupSpecies Markup
Building a Structure Centric Community
for Chemists
Dictionaries are Easily EnhancedDictionaries are Easily Enhanced
 Copy-Paste into appropriate Entity DictionaryCopy-Paste into appropriate Entity Dictionary
 Impacts all future markupsImpacts all future markups
 Expanding knowledgebases of informationExpanding knowledgebases of information
 Linked out to rich sources of informationLinked out to rich sources of information
Building a Structure Centric Community
for Chemists
Build DictionariesBuild Dictionaries
Ontologies NextOntologies Next
Building a Structure Centric Community
for Chemists
Outlinks…Outlinks…
Building a Structure Centric Community
for Chemists
Publishers and Document Mark-UpPublishers and Document Mark-Up
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
 Linked from WikipediaLinked from Wikipedia
 Linked from Open Notebook Science sites using EMBEDLinked from Open Notebook Science sites using EMBED
 Linked from Blogs using Structure/Spectra EMBEDLinked from Blogs using Structure/Spectra EMBED
 Integrated into structure drawing packages such asIntegrated into structure drawing packages such as
ACD/ChemSketch, Symyx Draw, Open Source appletsACD/ChemSketch, Symyx Draw, Open Source applets
 Integrated to software offerings from Thermo, Waters, Agilent,Integrated to software offerings from Thermo, Waters, Agilent,
BrukerBruker
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Embed Functionality (like YouTube)Embed Functionality (like YouTube)
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
www.spectralgame.comwww.spectralgame.com
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Crowdsourced Curation of SpectraCrowdsourced Curation of Spectra
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
RSC CompoundsRSC Compounds
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Nature ChemistryNature Chemistry
Nature ChemistryNature Chemistry articles arearticles are
annotated to identify all of theannotated to identify all of the
chemical compounds mentionedchemical compounds mentioned
throughout the text.throughout the text.
Those compounds are linked out toThose compounds are linked out to
other information resourcesother information resources
including PubChem andincluding PubChem and
ChemSpiderChemSpider..
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
ChemMobiChemMobi
Building a Structure Centric Community
for Chemists
Structure RSS Feeds with InChIsStructure RSS Feeds with InChIs
Building a Structure Centric Community
for Chemists
Building a Structure Centric Community
for Chemists
AcknowledgmentsAcknowledgments
 Richard Kidd, Royal Society of ChemistryRichard Kidd, Royal Society of Chemistry
 Jason Wilde, Nature Publishing GroupJason Wilde, Nature Publishing Group
 Martin Walker and the Wikipedia Chemistry teamMartin Walker and the Wikipedia Chemistry team
 Microsoft – Rudy PotenzoneMicrosoft – Rudy Potenzone
 Symyx – Keith Taylor and James JackSymyx – Keith Taylor and James Jack
 SureChem – Nicko GoncharoffSureChem – Nicko Goncharoff
 Spectral game - Andrew Lang and Jean-Claude BradleySpectral game - Andrew Lang and Jean-Claude Bradley
 ““The InChI team and Advisory Group”The InChI team and Advisory Group”
Building a Structure Centric Community
for Chemists
ConclusionsConclusions
www.chemspider.comwww.chemspider.com
www.chemspider.com/journalwww.chemspider.com/journal
InChIs and Internet ChemistryInChIs and Internet Chemistry
http://inchis.chemspider.comhttp://inchis.chemspider.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (7)

Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Engaging participation from the chemistry community
Engaging participation from the chemistry communityEngaging participation from the chemistry community
Engaging participation from the chemistry community
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
Building A Community Resource For The Life Sciences
Building A Community Resource For The Life SciencesBuilding A Community Resource For The Life Sciences
Building A Community Resource For The Life Sciences
 
The Great Promise of Online Data for Chemistry and the Life Sciences
The Great Promise of Online Data for Chemistry and the Life SciencesThe Great Promise of Online Data for Chemistry and the Life Sciences
The Great Promise of Online Data for Chemistry and the Life Sciences
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 

Similar a Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9Scott Conner
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityRoyal Society of Chemistry
 

Similar a Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry (20)

Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9Whitney Symposium Lecturejune 2008 1220331644496491 9
Whitney Symposium Lecturejune 2008 1220331644496491 9
 
Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpiderNavigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
Online Public Compound Databases
Online Public Compound DatabasesOnline Public Compound Databases
Online Public Compound Databases
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
Connecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpiderConnecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpider
 
ChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry ResourcesChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry Resources
 
ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007
 
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
RSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For ChemistsRSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider – Building An Internet Based Community For Chemists
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
 
ChemSpider hosting linking and curating chemistry data for the community
ChemSpider  hosting linking and curating chemistry data for the communityChemSpider  hosting linking and curating chemistry data for the community
ChemSpider hosting linking and curating chemistry data for the community
 

Último

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Último (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

  • 1. Crowdsourcing, Collaborations andCrowdsourcing, Collaborations and Text-Mining in a World of OpenText-Mining in a World of Open ChemistryChemistry Antony WilliamsAntony Williams Bio-IT World 2009Bio-IT World 2009
  • 2. Building a Structure Centric Community for Chemists Linked Data CloudLinked Data Cloud
  • 3. Building a Structure Centric Community for Chemists Chemistry on the InternetChemistry on the Internet  Much of the information online isMuch of the information online is User Beware!User Beware!  The Quality of information is “diverse”The Quality of information is “diverse”  Technologies can “link and connect” information butTechnologies can “link and connect” information but validation and curation is key to providing qualityvalidation and curation is key to providing quality  The LinkedData web is of less value when the data linkedThe LinkedData web is of less value when the data linked are “wrong”are “wrong”
  • 4. Building a Structure Centric Community for Chemists Quality CostsQuality Costs  Chemical Abstracts ServiceChemical Abstracts Service (CAS), a division of the(CAS), a division of the ACS is “Gold Standard” in Chemistry relatedACS is “Gold Standard” in Chemistry related informationinformation  101 years of content, $260 million revenue (2006), >40101 years of content, $260 million revenue (2006), >40 million substances and 60 million sequencesmillion substances and 60 million sequences  But online…But online…
  • 5. Building a Structure Centric Community for Chemists What is “wrong”?What is “wrong”?
  • 6. Building a Structure Centric Community for Chemists  A platform for:A platform for:  Data deposition,Data deposition, curation and annotationcuration and annotation  Supporting Open Notebook Science effortsSupporting Open Notebook Science efforts  Chemistry document mark-up with ChemMantisChemistry document mark-up with ChemMantis  The Open Access ChemSpider Journal of ChemistryThe Open Access ChemSpider Journal of Chemistry
  • 7. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  • 8. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  • 9. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  • 10. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  • 11. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  • 12. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  • 13. Building a Structure Centric Community for Chemists Complex Data and InformationComplex Data and Information
  • 14. Building a Structure Centric Community for Chemists Online DataOnline Data  Many websites host structure-based informationMany websites host structure-based information  Question quality!!!Question quality!!!
  • 15. Building a Structure Centric Community for Chemists
  • 16. Building a Structure Centric Community for Chemists Wikipedia, C&E News, PubChemWikipedia, C&E News, PubChem C&E News (from ACS)C&E News (from ACS)
  • 17. Building a Structure Centric Community for Chemists Does one stereocenter matter?Does one stereocenter matter?
  • 18. Building a Structure Centric Community for Chemists VancomycinVancomycin  Who will curate?Who will curate?  PubChem is notPubChem is not resourced to cleanresourced to clean these errorsthese errors   How would youHow would you clean such a largeclean such a large dataset?dataset?
  • 19. Building a Structure Centric Community for Chemists VancomycinVancomycin ChemSpider: 1 compound – 3 daysChemSpider: 1 compound – 3 days
  • 20. Building a Structure Centric Community for Chemists Question EverythingQuestion Everything www.dhmo.orgwww.dhmo.org
  • 21. Building a Structure Centric Community for Chemists DailyMedDailyMed ““DailyMed providesDailyMed provides high qualityhigh quality information aboutinformation about marketed drugs.marketed drugs. This information includes FDA approved labelsThis information includes FDA approved labels (package inserts).”(package inserts).”
  • 22. Building a Structure Centric Community for Chemists The FDA’s DailyMedThe FDA’s DailyMed
  • 23. Building a Structure Centric Community for Chemists Structures on DailyMedStructures on DailyMed Poor RepresentationsPoor Representations
  • 24. Building a Structure Centric Community for Chemists Structures on DailyMedStructures on DailyMed Lack of StereochemistyLack of Stereochemisty
  • 25. Building a Structure Centric Community for Chemists Incorrect StructuresIncorrect Structures Scanning (?) IssuesScanning (?) Issues
  • 26. Building a Structure Centric Community for Chemists Incorrect StructuresIncorrect Structures
  • 27. Building a Structure Centric Community for Chemists Does it Matter?Does it Matter?  Does it matter to the consumer that the structures areDoes it matter to the consumer that the structures are wrong? No…what matters is what is in the bottle is thewrong? No…what matters is what is in the bottle is the right medication!right medication!  To make DailyMed structure searchable it DOESTo make DailyMed structure searchable it DOES mattermatter  To data mine DailyMed it mattersTo data mine DailyMed it matters  To mark up DailyMed it mattersTo mark up DailyMed it matters
  • 28. Building a Structure Centric Community for Chemists CollaborativeCollaborative Knowledge ManagementKnowledge Management for Chemistsfor Chemists
  • 29. Building a Structure Centric Community for Chemists Wikipedia Links to DrugbankWikipedia Links to Drugbank
  • 30. Building a Structure Centric Community for Chemists Taxol on PubChemTaxol on PubChem
  • 31. Building a Structure Centric Community for Chemists Taxol on Daily MedTaxol on Daily Med
  • 32. Building a Structure Centric Community for Chemists The InChI IdentifierThe InChI Identifier
  • 33. Building a Structure Centric Community for Chemists Multiple LayersMultiple Layers Source: Unofficial InChI FAQ pageSource: Unofficial InChI FAQ page
  • 34. Building a Structure Centric Community for Chemists InChIStrings Hash to InChIKeysInChIStrings Hash to InChIKeys
  • 35. Building a Structure Centric Community for Chemists InChIs for TaxolInChIs for Taxol
  • 36. Building a Structure Centric Community for Chemists Back to TaxolBack to Taxol  DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD  ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN  Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ  Which one is correct???
  • 37. Building a Structure Centric Community for Chemists InChIKeys for TaxolInChIKeys for Taxol  DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD  ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN  Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ  ChEBI and Wikipedia are the SAME structure  Drugbank is a DIFFERENT structure – ONE stereocenter
  • 38. Building a Structure Centric Community for Chemists The InChI ResolverThe InChI Resolver
  • 39. Building a Structure Centric Community for Chemists
  • 40. Building a Structure Centric Community for Chemists Coming Soon…Linked ArticlesComing Soon…Linked Articles
  • 41. Building a Structure Centric Community for Chemists How bad can it get???How bad can it get??? And who is right????And who is right????
  • 42. Building a Structure Centric Community for Chemists ChemMantisChemMantis  ChemChemicalical MMarkuparkup AAndnd NNomenclatureomenclature TTransformationransformation IIntegratedntegrated SSystem –ystem – ChemMantisChemMantis  A platform for entity extraction for chemistryA platform for entity extraction for chemistry documents, markup and integration to onlinedocuments, markup and integration to online information sources – Wikipedia, ChemSpider, Entrez…information sources – Wikipedia, ChemSpider, Entrez…  Web-based submission, markup and publishing platformWeb-based submission, markup and publishing platform now hosting thenow hosting the ChemSpider Journal of ChemistryChemSpider Journal of Chemistry
  • 43. Building a Structure Centric Community for Chemists ChemMantis MarkupChemMantis Markup
  • 44. Building a Structure Centric Community for Chemists Enable Electronic Articles…Enable Electronic Articles…  Structures are theStructures are the language of chemistrylanguage of chemistry  Show structures toShow structures to chemists and search/linkchemists and search/link from there…from there…
  • 45. Building a Structure Centric Community for Chemists Species MarkupSpecies Markup
  • 46. Building a Structure Centric Community for Chemists Dictionaries are Easily EnhancedDictionaries are Easily Enhanced  Copy-Paste into appropriate Entity DictionaryCopy-Paste into appropriate Entity Dictionary  Impacts all future markupsImpacts all future markups  Expanding knowledgebases of informationExpanding knowledgebases of information  Linked out to rich sources of informationLinked out to rich sources of information
  • 47. Building a Structure Centric Community for Chemists Build DictionariesBuild Dictionaries Ontologies NextOntologies Next
  • 48. Building a Structure Centric Community for Chemists Outlinks…Outlinks…
  • 49. Building a Structure Centric Community for Chemists Publishers and Document Mark-UpPublishers and Document Mark-Up
  • 50. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere  Linked from WikipediaLinked from Wikipedia  Linked from Open Notebook Science sites using EMBEDLinked from Open Notebook Science sites using EMBED  Linked from Blogs using Structure/Spectra EMBEDLinked from Blogs using Structure/Spectra EMBED  Integrated into structure drawing packages such asIntegrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source appletsACD/ChemSketch, Symyx Draw, Open Source applets  Integrated to software offerings from Thermo, Waters, Agilent,Integrated to software offerings from Thermo, Waters, Agilent, BrukerBruker
  • 51. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere Embed Functionality (like YouTube)Embed Functionality (like YouTube)
  • 52. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere www.spectralgame.comwww.spectralgame.com
  • 53. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere Crowdsourced Curation of SpectraCrowdsourced Curation of Spectra
  • 54. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere RSC CompoundsRSC Compounds
  • 55. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere Nature ChemistryNature Chemistry Nature ChemistryNature Chemistry articles arearticles are annotated to identify all of theannotated to identify all of the chemical compounds mentionedchemical compounds mentioned throughout the text.throughout the text. Those compounds are linked out toThose compounds are linked out to other information resourcesother information resources including PubChem andincluding PubChem and ChemSpiderChemSpider..
  • 56. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere ChemMobiChemMobi
  • 57. Building a Structure Centric Community for Chemists Structure RSS Feeds with InChIsStructure RSS Feeds with InChIs
  • 58. Building a Structure Centric Community for Chemists
  • 59. Building a Structure Centric Community for Chemists AcknowledgmentsAcknowledgments  Richard Kidd, Royal Society of ChemistryRichard Kidd, Royal Society of Chemistry  Jason Wilde, Nature Publishing GroupJason Wilde, Nature Publishing Group  Martin Walker and the Wikipedia Chemistry teamMartin Walker and the Wikipedia Chemistry team  Microsoft – Rudy PotenzoneMicrosoft – Rudy Potenzone  Symyx – Keith Taylor and James JackSymyx – Keith Taylor and James Jack  SureChem – Nicko GoncharoffSureChem – Nicko Goncharoff  Spectral game - Andrew Lang and Jean-Claude BradleySpectral game - Andrew Lang and Jean-Claude Bradley  ““The InChI team and Advisory Group”The InChI team and Advisory Group”
  • 60. Building a Structure Centric Community for Chemists ConclusionsConclusions www.chemspider.comwww.chemspider.com www.chemspider.com/journalwww.chemspider.com/journal InChIs and Internet ChemistryInChIs and Internet Chemistry http://inchis.chemspider.comhttp://inchis.chemspider.com