SlideShare una empresa de Scribd logo
1 de 63
eScience Resources for the
Chemistry Community from the
Royal Society of Chemistry
Antony Williams
NCSU, College of Textiles
October 2nd
2013
We Have …Too Much Data!!!
The World of Online Chemistry
• Property databases
• Compound aggregators
• Screening assay results
• Scientific publications
• Encyclopedic articles (Wikipedia)
• Metabolic pathway databases
• ADME/Tox data – eTOX for example
• Blogs/Wikis and Open Notebook Science
e-Science and Primary Data
• How much data generated in a lab, that COULD go
public, is lost forever?
e-Science and Primary Data
• How much data generated in a lab, that COULD go
public, is lost forever?
• Public Domain reference databases of value?
– Syntheses
– Properties
– Spectra
– CIFs
– Images
e-Science and Primary Data
• How much data generated in a lab, that COULD go
public, is lost forever?
• Public Domain reference databases of value?
– Syntheses
– Properties
– Spectra
– CIFs
– Images
• Much of chemistry is chemical structure-based – where
and how could we host these data?
RSC’s ChemSpider
ChemSpider
• >29 million unique chemicals from >500
data sources
• Focus on improving data quality,
enhancing functionality, integrating and
enabling
Crowdsourced “Annotations”
• Users can add
– Descriptions/Syntheses/Commentaries
– Links to PubMed articles
– Links to articles via DOIs
– Add spectral data
– Add Crystallographic Information Files
– Add photos
– Add MP3 files
– Add Videos
Spectra
Chemistry Data online are messy
• We have inherited errors
• All public compound databases have errors
• “Incorrect” structures – assertions, timelines etc
• “Incorrect” names associated with structures
• Properties
• Links
• Publications
• ENORMOUS CHALLENGE
Crowdsourced Curation
• Crowd-sourced curation: identify/tag
errors, edit names, synonyms, identify
records to deprecate
Search “Vitamin H”
“Curate” Identifiers
“Curate” Identifiers
“Curate” Identifiers
Validated Name-Structure Dictionaries
• Chemical name dictionaries are used for:
• Text-mining (publications, patents)
– Used to index PubMed and link to Google Patents
• Linking to other databases – think Biology!
– When structures are not available drug names link
• Searching the web
– Names link to structures link to InChIs
I want to know about “Vincristine”
Vincristine: Identifiers and Properties
Vincristine: Vendors and Sources
Linked by Structure
Vincristine: Patents
Linked by Name
Vincristine: Articles
Linked by Name
Semantic Mark-up of Articles
Linking Names to Structures
The InChI Identifier
InChIStrings Hash to InChIKeys
Vancomycin – Search the Internet
Vancomycin
Search Molecular
SKELETON
Search Full Molecule
Full Skeleton Search: 104 Hits
Full Molecule Search: 4 Hits
ChemSpider Resources for Chemistry
Some usage statistics
• ca. 200 visitors at any one time, ~30,000 visits
per day
• Mar 4-Apr 3, 2013
– Visits = 731,656
– Unique Visitors = 527,008
• Independent servers to support other projects
Publications - a summary of work
• Scientific publications are a summary of
work
– Is all work reported?
– How much science is lost to pruning?
– What of value sits in notebooks and is lost?
• How much data is lost?
– How many compounds never reported?
– How many syntheses fail or succeed?
– How many characterization measurements?
About Me…as a Chemist
• I’ve performed a few dozen chemical syntheses
• I’ve run thousands of analytical spectra
• I’ve generated thousands of NMR assignments
• I’ve probably published <5% of all work
• Most of it has been lost
• But things can be different today….
• But it still needs to be associated with me…
Micropublishing Syntheses
ChemSpider SyntheticPages
Olympicene
So you Want a Profile???
Interactive Data
Rewards and Recognition
Congratulations! Your 1st CSSP
article has been published.
Philosopher Lao Tzu said “A
journey of a thousand miles begins
with a single step”. In the same
way we hope that this will be the
first of many submissions that you
make to CSSP.
The First Step badge is
awarded when a user
submits (& has published)
their 1st
CSSP article.
Integrate to instruments and software
• Integration to analytical instrumentation
vendors already in place
– Agilent, Bruker, Thermo, Waters
• Also, Cheminformatics vendors link to
ChemSpider
– Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
PharmaSea
• Dereplication via ChemSpider
• Segregation of natural products datasets
• Analytical data algorithms & integration
– Mass spec searching – predicted fragmentation
– NMR feature searching – NMR prediction
– Computer-assisted structure elucidation
It is so difficult to navigate…
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections
to disease?
Connections
to disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using semantic
web technologies
• Open source code, open data and open standards
• Academics, Pharma companies, Publishers….
ChemSpider Contributions
• The host of the chemistry services
– Supplier of “standardized” chemical data files
– Chemistry searching (structure, substructure
etc)
– Curator and data quality checking
• We built the Open PHACTS chemical
registration system
Open Source Drug Discovery
Chemical Database
Service
• National Chemical Database Service
for UK Academics
• Integrating Commercial Databases
and Services
• Chemicals, analytical data,
prediction algorithms
• Development of data repository
Community Repository for Data
• Funding agencies encourage sharing of data
• Increasing availability of “Open Data”
• Institutional repositories no specific domain
support
• Develop a community repository for
chemistry data – private, public, embargoed
• Provides data to develop models and
algorithms
Community Repository for Data
• Automated depositions of data
• DOI’ed data objects for citation purposes
• A database of reference data, but validated by
the community
• National services feeding the repository –
crystallography, mass spectrometry
• Integrate to blogging tools for chemistry
• Integrate to Electronic Lab Notebooks as
feeds
Model Building with Community Data
• Community data as a basis of model building
– Consume data from available databases,
community data, new publications and build
predictive algorithms for the community
– How many algorithms are reported and lost?
How much repeat work is done in the domain of
algorithmic development?
Inside our Publication Archive
• How much data is in the archive, in the
publications and in the supplementary
info?
– How many compounds for ChemSpider?
– How many syntheses for ChemSpider
reactions?
– How many characterization measurements?
• Property Data
• Spectral Data
• Graphs and charts to be used for modeling?
What if we could capture it all?
Digitally Enhancing the RSC Archive
Start with data in publications
Turn “Figures” Into Data
ChemSpider Reactions
• Starting with data from CSSP, MOS and CCR
• Will cover reactions extracted from:
• Patents
• RSC journal articles and ESI
E-Lab Notebooks
• Integration between ELNs
and:
• ChemSpider
• ChemSpider Reactions
• Chemistry Data Repository
Internet Data
The Future
Commercial Software
Pre-competitive Data
Open Science
Open Data
Publishers
Educators
Open Databases
Chemical Vendors
Small organic molecules
Undefined materials
Organometallics
Nanomaterials
Polymers
Minerals
Particle bound
Links to Biologicals
The Future of Chemistry on the Web?
• Public compound databases federate & build
a linked environment of validated data!
• Data validation needs are not ignored
• Publishers layer on information to make
publications discoverable
• Open Data proliferate
• The “Semantic Web” will continue to
develop…
Thank you
Email: williamsa@rsc.org
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Más contenido relacionado

La actualidad más candente

Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryValery Tkachenko
 
Backbone taxonomies, data aggregation, and early career systematists: somethi...
Backbone taxonomies, data aggregation, and early career systematists: somethi...Backbone taxonomies, data aggregation, and early career systematists: somethi...
Backbone taxonomies, data aggregation, and early career systematists: somethi...MAndrewJ
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 

La actualidad más candente (20)

Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
The future of scientific information & communication
The future of scientific information & communicationThe future of scientific information & communication
The future of scientific information & communication
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistry
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
Backbone taxonomies, data aggregation, and early career systematists: somethi...
Backbone taxonomies, data aggregation, and early career systematists: somethi...Backbone taxonomies, data aggregation, and early career systematists: somethi...
Backbone taxonomies, data aggregation, and early career systematists: somethi...
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 

Destacado

GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300
GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300
GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300Machine Tool Systems Inc.
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)maditabalnco
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...Brian Solis
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source CreativitySara Cannon
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome EconomyHelge Tennø
 

Destacado (7)

GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300
GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300
GF Machining Solutions - AgieCharmilles - EDM Hole Drilling - Model 300
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source Creativity
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 

Similar a eScience Resources for the Chemistry Community from the Royal Society of Chemistry

ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsValery Tkachenko
 

Similar a eScience Resources for the Chemistry Community from the Royal Society of Chemistry (20)

Utilizing Online Databases for the Purpose of Structure Identification – Appr...
Utilizing Online Databases for the Purpose of Structure Identification – Appr...Utilizing Online Databases for the Purpose of Structure Identification – Appr...
Utilizing Online Databases for the Purpose of Structure Identification – Appr...
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platforms
 
Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...
 
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

eScience Resources for the Chemistry Community from the Royal Society of Chemistry

  • 1. eScience Resources for the Chemistry Community from the Royal Society of Chemistry Antony Williams NCSU, College of Textiles October 2nd 2013
  • 2. We Have …Too Much Data!!!
  • 3. The World of Online Chemistry • Property databases • Compound aggregators • Screening assay results • Scientific publications • Encyclopedic articles (Wikipedia) • Metabolic pathway databases • ADME/Tox data – eTOX for example • Blogs/Wikis and Open Notebook Science
  • 4. e-Science and Primary Data • How much data generated in a lab, that COULD go public, is lost forever?
  • 5. e-Science and Primary Data • How much data generated in a lab, that COULD go public, is lost forever? • Public Domain reference databases of value? – Syntheses – Properties – Spectra – CIFs – Images
  • 6. e-Science and Primary Data • How much data generated in a lab, that COULD go public, is lost forever? • Public Domain reference databases of value? – Syntheses – Properties – Spectra – CIFs – Images • Much of chemistry is chemical structure-based – where and how could we host these data?
  • 8. ChemSpider • >29 million unique chemicals from >500 data sources • Focus on improving data quality, enhancing functionality, integrating and enabling
  • 9. Crowdsourced “Annotations” • Users can add – Descriptions/Syntheses/Commentaries – Links to PubMed articles – Links to articles via DOIs – Add spectral data – Add Crystallographic Information Files – Add photos – Add MP3 files – Add Videos
  • 10.
  • 12. Chemistry Data online are messy • We have inherited errors • All public compound databases have errors • “Incorrect” structures – assertions, timelines etc • “Incorrect” names associated with structures • Properties • Links • Publications • ENORMOUS CHALLENGE
  • 13. Crowdsourced Curation • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 18. Validated Name-Structure Dictionaries • Chemical name dictionaries are used for: • Text-mining (publications, patents) – Used to index PubMed and link to Google Patents • Linking to other databases – think Biology! – When structures are not available drug names link • Searching the web – Names link to structures link to InChIs
  • 19. I want to know about “Vincristine”
  • 21. Vincristine: Vendors and Sources Linked by Structure
  • 25. Linking Names to Structures
  • 27. InChIStrings Hash to InChIKeys
  • 28. Vancomycin – Search the Internet
  • 33. Some usage statistics • ca. 200 visitors at any one time, ~30,000 visits per day • Mar 4-Apr 3, 2013 – Visits = 731,656 – Unique Visitors = 527,008 • Independent servers to support other projects
  • 34. Publications - a summary of work • Scientific publications are a summary of work – Is all work reported? – How much science is lost to pruning? – What of value sits in notebooks and is lost? • How much data is lost? – How many compounds never reported? – How many syntheses fail or succeed? – How many characterization measurements?
  • 35. About Me…as a Chemist • I’ve performed a few dozen chemical syntheses • I’ve run thousands of analytical spectra • I’ve generated thousands of NMR assignments • I’ve probably published <5% of all work • Most of it has been lost • But things can be different today…. • But it still needs to be associated with me…
  • 39. So you Want a Profile???
  • 40.
  • 41.
  • 43. Rewards and Recognition Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP. The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.
  • 44. Integrate to instruments and software • Integration to analytical instrumentation vendors already in place – Agilent, Bruker, Thermo, Waters • Also, Cheminformatics vendors link to ChemSpider – Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
  • 45.
  • 46. PharmaSea • Dereplication via ChemSpider • Segregation of natural products datasets • Analytical data algorithms & integration – Mass spec searching – predicted fragmentation – NMR feature searching – NMR prediction – Computer-assisted structure elucidation
  • 47. It is so difficult to navigate… What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  • 48. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharma companies, Publishers….
  • 49. ChemSpider Contributions • The host of the chemistry services – Supplier of “standardized” chemical data files – Chemistry searching (structure, substructure etc) – Curator and data quality checking • We built the Open PHACTS chemical registration system
  • 50. Open Source Drug Discovery
  • 51. Chemical Database Service • National Chemical Database Service for UK Academics • Integrating Commercial Databases and Services • Chemicals, analytical data, prediction algorithms • Development of data repository
  • 52. Community Repository for Data • Funding agencies encourage sharing of data • Increasing availability of “Open Data” • Institutional repositories no specific domain support • Develop a community repository for chemistry data – private, public, embargoed • Provides data to develop models and algorithms
  • 53. Community Repository for Data • Automated depositions of data • DOI’ed data objects for citation purposes • A database of reference data, but validated by the community • National services feeding the repository – crystallography, mass spectrometry • Integrate to blogging tools for chemistry • Integrate to Electronic Lab Notebooks as feeds
  • 54. Model Building with Community Data • Community data as a basis of model building – Consume data from available databases, community data, new publications and build predictive algorithms for the community – How many algorithms are reported and lost? How much repeat work is done in the domain of algorithmic development?
  • 55. Inside our Publication Archive • How much data is in the archive, in the publications and in the supplementary info? – How many compounds for ChemSpider? – How many syntheses for ChemSpider reactions? – How many characterization measurements? • Property Data • Spectral Data • Graphs and charts to be used for modeling?
  • 56. What if we could capture it all? Digitally Enhancing the RSC Archive
  • 57. Start with data in publications
  • 59. ChemSpider Reactions • Starting with data from CSSP, MOS and CCR • Will cover reactions extracted from: • Patents • RSC journal articles and ESI
  • 60. E-Lab Notebooks • Integration between ELNs and: • ChemSpider • ChemSpider Reactions • Chemistry Data Repository
  • 61. Internet Data The Future Commercial Software Pre-competitive Data Open Science Open Data Publishers Educators Open Databases Chemical Vendors Small organic molecules Undefined materials Organometallics Nanomaterials Polymers Minerals Particle bound Links to Biologicals
  • 62. The Future of Chemistry on the Web? • Public compound databases federate & build a linked environment of validated data! • Data validation needs are not ignored • Publishers layer on information to make publications discoverable • Open Data proliferate • The “Semantic Web” will continue to develop…
  • 63. Thank you Email: williamsa@rsc.org Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams