SlideShare una empresa de Scribd logo
1 de 50
ChemSpider – disseminating data
   and enabling an abundance of
            chemistry platforms

 Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey
     Pshenichnov, Dmitry Ivanov, Colin Batchelor, Jon Steele
                                           and David Sharpe


                                ACS New Orleans April 2013
ChemSpider
• >28.5 million unique chemicals from >400
  data sources
• Focus on improving data quality, enhancing
  functionality, integrating and enabling
Some usage statistics
• ca. 200 visitors at any one time, ~30,000 visits per day
• Mar 4-Apr 3, 2013
   – Visits = 731,656
   – Unique Visitors = 527,008
• Independent servers to support other projects
Access ChemSpider
• APIs
  – Programmatic access used by Mobile Apps, Funded
    Consortia projects, many Academic groups
• Widgets
  – UI components for embedding in other websites
• Data
  – Data access, downloads, reuse, licensing
Supporting the Semantic Web
  rdf.chemspider.com/CSID
ChemSpider Resources for Chemistry
ChemSpider Audiences
                    Simplified interface




                     …..to this
From this…..
Substance Pages
It is so difficult to navigate…
                                                       IP?
                                                        IP?
                            What’s the
                             What’s the
                            structure?
                             structure?
                                                   Are they in
                                                    Are they in
                                                    our file?
                                                     our file?
                             What’s
                               What’s
                             similar?
                              similar?
                                                   What’s the
                                                    What’s the
                         Pharmacology
                          Pharmacology              target?
                                                     target?
                             data?
                              data?

                                            Known
                                             Known
                                          Pathways?
                                           Pathways?
                        Competitors?
                         Competitors?
                                                   Working On
                                                    Working On
                        Connections to
                         Connections to              Now?
                                                      Now?
                           disease?
                            disease?
                                             Expressed in
                                               Expressed in
                                           right cell type?
                                             right cell type?
• 3-year knowledge management IMI project

• Integrating chemistry and biology data and delivering
  using semantic web technologies

• Open source code, open data and open standards

• Academics, Pharma companies, Publishers….
ChemSpider Contributions
• The host of the chemistry services
  – Supplier of “standardized” chemical data files
  – Chemistry searching (structure, substructure etc)
  – Provider of data in RDF format
  – Curator and data quality checking
• Now building the Open PHACTS chemical
  registration system
ChemSpider Contributions
•   Supplier of chemistry UI components
•   “Quality Police” for data checking
•   Chemical Validation and Standardization Platform
•   Nanopublications from RSC publications
• FP7 Initiative. PharmaSea: increasing value and flow in
  the marine biodiscovery pipeline
PharmaSea
• Dereplication via ChemSpider
• Segregation of natural products datasets
• Analytical data algorithms & integration
  – Mass spec searching – predicted fragmentation
  – NMR feature searching – NMR prediction
  – Computer-assisted structure elucidation
Integrate to instruments and software
• Integration to analytical instrumentation vendors
  already in place
  – Agilent, Bruker, Thermo, Waters


• Also, Cheminformatics vendors link to ChemSpider
  – Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
Natural Products Updates
• Names hard, Structures
  “Obvious”

• New content based on
  monthly updates of the
  database

• Click through to the Natural
  Products Updates entry
National Chemical Database Service
Chemical Database
Service
• National Chemical Database Service
  for UK Academics
• Integrating Commercial Databases
  and Services

• Chemicals, analytical data,
  prediction algorithms

• Development of data repository
Retrosynthetic Analysis
Publications - a summary of work
• Scientific publications are a summary of work
  – Is all work reported?
  – How much science is lost to pruning?
  – What of value sits in notebooks and is lost?
• How much data is lost?
  – How many compounds never reported?
  – How many syntheses fail or succeed?
  – How many characterization measurements?
Community Repository for Data
• Funding agencies encourage sharing of data
• Increasing availability of “Open Data”
• Institutional repositories no specific domain
  support
• Develop a community repository for chemistry
  data – private, public, embargoed
• Provides data to develop models/algorithms
Community Repository for Data
• Automated depositions of data
• DOI’ed data objects for citation purposes
• A database of reference data, but validated by
  the community
• National services feeding the repository –
  crystallography, mass spectrometry
• Integrate to blogging tools for chemistry
• Integrate to Electronic Lab Notebooks as feeds
Model Building with Community Data
• Community data as a basis of model building
  – Consume data from available databases, community
    data, new publications and build predictive
    algorithms for the community
  – How many algorithms are reported and lost? How
    much repeat work is done in the domain of
    algorithmic development?
Recognition on
Data

IC50 Measurements for 62 substituted benzoxazoles
ChemSpider Data Repository: DOI: 10.1356/CSID784.4
Integrate to electronic lab notebooks
E-Lab Notebooks
• Previous work with IDBS and
  University of Cambridge
• Working on LabTrove integration
  win U. Southampton
• Integration between ELNs and:
   • ChemSpider
   • ChemSpider Reactions
   • CDS Repository
• Publish data from ELNs issue DOIs
• Data aggregated into fully indexed
  ESI format for publication
Support for Chemical Reactions



• Integrating mined reaction data from patents
  (Daniel Lowe)
• Will also incorporate and integrate: Methods
  of Organic Synthesis, Catalysts and Catalyzed
  Reactions and…
Micro-publishing Chemical Reactions
ChemSpider SyntheticPages
Retrosynthetic Analysis
Inside our Publication Archive
• How much data is in the archive, in the
  publications and in the supplementary info?
  – How many compounds for ChemSpider?
  – How many syntheses for ChemSpider reactions?
  – How many characterization measurements?
     • Property Data
     • Spectral Data
     • Graphs and charts to be used for modeling?
What if we could capture it all?
Digitally Enhancing the RSC Archive
Start with data in publications
Recent Work
Comparison of Spectra
Data Validation and Curation Required
CVSP: Validation and Standardization
Data Validation and Curation Required

   Encouraging Participation with
    Rewards and RECOGNITION
Manual Curation
• Integrated commenting, curating and validation
  platform across ALL eScience and publishing
  platforms
• All integrated to a central RSC profile and
  feeding the AltMetrics tools
Structure Review
Where we are now…
Rewards and Recognition
            The First Step badge is
            awarded when a user
            submits (& has published)
            their 1st CSSP article.

Congratulations! Your 1st CSSP article
has been published. Philosopher Lao
Tzu said “A journey of a thousand
miles begins with a single step”. In the
same way we hope that this will be
the first of many submissions that you
make to CSSP.
Future Recognition in AltMetrics?




                         ChemSpider
Why is ChemSpider “different”
• Interfaces for integration
• Sharing of data – and increasingly open
• Open for community participation
  – Deposition
  – Annotation
  – Curation
• We are clear…the world is changing
The Future
                      Internet Data




Small organic molecules               Commercial Software
Undefined materials                   Pre-competitive Data
Organometallics                             Open Science
Nanomaterials                                  Open Data
Polymers                                       Publishers
Minerals                                       Educators
Particle bound                            Open Databases
Links to Biologicals                    Chemical Vendors
Acknowledgments
• The RSC eScience and infrastructure teams
• Our data providers, depositors, collaborators
  and curators
• Daniel Lowe for Reaction Data
• William Brouwer, Penn State
• Software providers – OpenEye, ChemDoodle,
  ACD/Labs, GGA Software, Open Source (Jmol,
  JSpecView, OpenBabel)
Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Más contenido relacionado

La actualidad más candente

ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 

La actualidad más candente (18)

Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 

Destacado

Functional Reactive Programming at Booster 2014
Functional Reactive Programming at Booster 2014Functional Reactive Programming at Booster 2014
Functional Reactive Programming at Booster 2014mikaelbr
 
Social media demographics for extension
Social media demographics for extensionSocial media demographics for extension
Social media demographics for extensionSarah Baughman
 
Badges, Badgers, Mushrooms, and a Snake
Badges, Badgers, Mushrooms, and a SnakeBadges, Badgers, Mushrooms, and a Snake
Badges, Badgers, Mushrooms, and a Snakenniiccoollee
 
STC Communities with Mentoring Programs
STC Communities with Mentoring ProgramsSTC Communities with Mentoring Programs
STC Communities with Mentoring ProgramsCindy Pao
 
Throttle and Debounce Patterns in Web Apps
Throttle and Debounce Patterns in Web AppsThrottle and Debounce Patterns in Web Apps
Throttle and Debounce Patterns in Web AppsAlmir Filho
 
State of the art server side java script
State of the art server side java scriptState of the art server side java script
State of the art server side java scriptThibaud Arguillere
 
Morgan e xt_062811
Morgan e xt_062811Morgan e xt_062811
Morgan e xt_062811kimorgan613
 
Combining Context with Signals in the IoT (longer version)
Combining Context with Signals in the IoT (longer version)Combining Context with Signals in the IoT (longer version)
Combining Context with Signals in the IoT (longer version)Andy Piper
 
Conquering The Context Conundrum
Conquering The Context ConundrumConquering The Context Conundrum
Conquering The Context ConundrumDaniel Eizans
 
MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents
MongoUK - Approaching 1 billion documents with MongoDB1 Billion DocumentsMongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents
MongoUK - Approaching 1 billion documents with MongoDB1 Billion DocumentsBoxed Ice
 
Presenting the work of OSMF Working Groups - State of the Map 2013
Presenting the work of OSMF Working Groups - State of the Map 2013Presenting the work of OSMF Working Groups - State of the Map 2013
Presenting the work of OSMF Working Groups - State of the Map 2013OSMFstateofthemap
 
Sigma Tau Delta Outreach
Sigma Tau Delta OutreachSigma Tau Delta Outreach
Sigma Tau Delta OutreachCindy Pao
 
challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...Sean Ekins
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Datatium - radiation free responsive experiences
Datatium - radiation free responsive experiencesDatatium - radiation free responsive experiences
Datatium - radiation free responsive experiencesAndrew Fisher
 
2011 TDI Conference Social Media Guide
2011 TDI Conference Social Media Guide2011 TDI Conference Social Media Guide
2011 TDI Conference Social Media GuidePurple Communications
 
Web Frontend development: tools and good practices to (re)organize the chaos
Web Frontend development: tools and good practices to (re)organize the chaosWeb Frontend development: tools and good practices to (re)organize the chaos
Web Frontend development: tools and good practices to (re)organize the chaosMatteo Papadopoulos
 
No money? No matter - Improve your website with next to no cash
No money? No matter - Improve your website with next to no cashNo money? No matter - Improve your website with next to no cash
No money? No matter - Improve your website with next to no cashIWMW
 
An Introduction to Multisite - WordCamp Phoenix
An Introduction to Multisite - WordCamp PhoenixAn Introduction to Multisite - WordCamp Phoenix
An Introduction to Multisite - WordCamp Phoenixvegasgeek
 

Destacado (20)

Functional Reactive Programming at Booster 2014
Functional Reactive Programming at Booster 2014Functional Reactive Programming at Booster 2014
Functional Reactive Programming at Booster 2014
 
Social media demographics for extension
Social media demographics for extensionSocial media demographics for extension
Social media demographics for extension
 
Badges, Badgers, Mushrooms, and a Snake
Badges, Badgers, Mushrooms, and a SnakeBadges, Badgers, Mushrooms, and a Snake
Badges, Badgers, Mushrooms, and a Snake
 
STC Communities with Mentoring Programs
STC Communities with Mentoring ProgramsSTC Communities with Mentoring Programs
STC Communities with Mentoring Programs
 
Throttle and Debounce Patterns in Web Apps
Throttle and Debounce Patterns in Web AppsThrottle and Debounce Patterns in Web Apps
Throttle and Debounce Patterns in Web Apps
 
State of the art server side java script
State of the art server side java scriptState of the art server side java script
State of the art server side java script
 
Morgan e xt_062811
Morgan e xt_062811Morgan e xt_062811
Morgan e xt_062811
 
Combining Context with Signals in the IoT (longer version)
Combining Context with Signals in the IoT (longer version)Combining Context with Signals in the IoT (longer version)
Combining Context with Signals in the IoT (longer version)
 
Conquering The Context Conundrum
Conquering The Context ConundrumConquering The Context Conundrum
Conquering The Context Conundrum
 
MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents
MongoUK - Approaching 1 billion documents with MongoDB1 Billion DocumentsMongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents
MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents
 
Presenting the work of OSMF Working Groups - State of the Map 2013
Presenting the work of OSMF Working Groups - State of the Map 2013Presenting the work of OSMF Working Groups - State of the Map 2013
Presenting the work of OSMF Working Groups - State of the Map 2013
 
Sigma Tau Delta Outreach
Sigma Tau Delta OutreachSigma Tau Delta Outreach
Sigma Tau Delta Outreach
 
challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...challenges and recommendations for obtaining chemical structures of industry-...
challenges and recommendations for obtaining chemical structures of industry-...
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Datatium - radiation free responsive experiences
Datatium - radiation free responsive experiencesDatatium - radiation free responsive experiences
Datatium - radiation free responsive experiences
 
2011 TDI Conference Social Media Guide
2011 TDI Conference Social Media Guide2011 TDI Conference Social Media Guide
2011 TDI Conference Social Media Guide
 
Web Frontend development: tools and good practices to (re)organize the chaos
Web Frontend development: tools and good practices to (re)organize the chaosWeb Frontend development: tools and good practices to (re)organize the chaos
Web Frontend development: tools and good practices to (re)organize the chaos
 
No money? No matter - Improve your website with next to no cash
No money? No matter - Improve your website with next to no cashNo money? No matter - Improve your website with next to no cash
No money? No matter - Improve your website with next to no cash
 
An Introduction to Multisite - WordCamp Phoenix
An Introduction to Multisite - WordCamp PhoenixAn Introduction to Multisite - WordCamp Phoenix
An Introduction to Multisite - WordCamp Phoenix
 
F# for C# devs - SDD 2015
F# for C# devs - SDD 2015F# for C# devs - SDD 2015
F# for C# devs - SDD 2015
 

Similar a ChemSpider – disseminating data and enabling an abundance of chemistry platforms

ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 

Similar a ChemSpider – disseminating data and enabling an abundance of chemistry platforms (20)

The expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry communityThe expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry community
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...Chemistry Online and The vision and challenges associated with building the c...
Chemistry Online and The vision and challenges associated with building the c...
 
The Great Promise of Online Data for Chemistry and the Life Sciences
The Great Promise of Online Data for Chemistry and the Life SciencesThe Great Promise of Online Data for Chemistry and the Life Sciences
The Great Promise of Online Data for Chemistry and the Life Sciences
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Royal Society of Chemistry projects underpinning open innovation
Royal Society of Chemistry projects underpinning open innovationRoyal Society of Chemistry projects underpinning open innovation
Royal Society of Chemistry projects underpinning open innovation
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Chemical Database Projects Delivered by RSC eScience
Chemical Database Projects Delivered by RSC eScienceChemical Database Projects Delivered by RSC eScience
Chemical Database Projects Delivered by RSC eScience
 
Chem spider as a chemical term resolver
Chem spider as a chemical term resolverChem spider as a chemical term resolver
Chem spider as a chemical term resolver
 
ChemSpider as a chemical term resolver
ChemSpider as a chemical term resolverChemSpider as a chemical term resolver
ChemSpider as a chemical term resolver
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Connecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpiderConnecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpider
 
Chemistry made mobile – the expanding world of chemistry in the hand
Chemistry made mobile – the expanding world of chemistry in the handChemistry made mobile – the expanding world of chemistry in the hand
Chemistry made mobile – the expanding world of chemistry in the hand
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Data integration
Data integrationData integration
Data integration
 
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

ChemSpider – disseminating data and enabling an abundance of chemistry platforms

  • 1. ChemSpider – disseminating data and enabling an abundance of chemistry platforms Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov, Dmitry Ivanov, Colin Batchelor, Jon Steele and David Sharpe ACS New Orleans April 2013
  • 2. ChemSpider • >28.5 million unique chemicals from >400 data sources • Focus on improving data quality, enhancing functionality, integrating and enabling
  • 3.
  • 4. Some usage statistics • ca. 200 visitors at any one time, ~30,000 visits per day • Mar 4-Apr 3, 2013 – Visits = 731,656 – Unique Visitors = 527,008 • Independent servers to support other projects
  • 5. Access ChemSpider • APIs – Programmatic access used by Mobile Apps, Funded Consortia projects, many Academic groups • Widgets – UI components for embedding in other websites • Data – Data access, downloads, reuse, licensing
  • 6. Supporting the Semantic Web rdf.chemspider.com/CSID
  • 8.
  • 9. ChemSpider Audiences Simplified interface …..to this From this…..
  • 11.
  • 12. It is so difficult to navigate… IP? IP? What’s the What’s the structure? structure? Are they in Are they in our file? our file? What’s What’s similar? similar? What’s the What’s the Pharmacology Pharmacology target? target? data? data? Known Known Pathways? Pathways? Competitors? Competitors? Working On Working On Connections to Connections to Now? Now? disease? disease? Expressed in Expressed in right cell type? right cell type?
  • 13. • 3-year knowledge management IMI project • Integrating chemistry and biology data and delivering using semantic web technologies • Open source code, open data and open standards • Academics, Pharma companies, Publishers….
  • 14. ChemSpider Contributions • The host of the chemistry services – Supplier of “standardized” chemical data files – Chemistry searching (structure, substructure etc) – Provider of data in RDF format – Curator and data quality checking • Now building the Open PHACTS chemical registration system
  • 15. ChemSpider Contributions • Supplier of chemistry UI components • “Quality Police” for data checking • Chemical Validation and Standardization Platform • Nanopublications from RSC publications
  • 16. • FP7 Initiative. PharmaSea: increasing value and flow in the marine biodiscovery pipeline
  • 17. PharmaSea • Dereplication via ChemSpider • Segregation of natural products datasets • Analytical data algorithms & integration – Mass spec searching – predicted fragmentation – NMR feature searching – NMR prediction – Computer-assisted structure elucidation
  • 18. Integrate to instruments and software • Integration to analytical instrumentation vendors already in place – Agilent, Bruker, Thermo, Waters • Also, Cheminformatics vendors link to ChemSpider – Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
  • 19. Natural Products Updates • Names hard, Structures “Obvious” • New content based on monthly updates of the database • Click through to the Natural Products Updates entry
  • 21. Chemical Database Service • National Chemical Database Service for UK Academics • Integrating Commercial Databases and Services • Chemicals, analytical data, prediction algorithms • Development of data repository
  • 23. Publications - a summary of work • Scientific publications are a summary of work – Is all work reported? – How much science is lost to pruning? – What of value sits in notebooks and is lost? • How much data is lost? – How many compounds never reported? – How many syntheses fail or succeed? – How many characterization measurements?
  • 24. Community Repository for Data • Funding agencies encourage sharing of data • Increasing availability of “Open Data” • Institutional repositories no specific domain support • Develop a community repository for chemistry data – private, public, embargoed • Provides data to develop models/algorithms
  • 25. Community Repository for Data • Automated depositions of data • DOI’ed data objects for citation purposes • A database of reference data, but validated by the community • National services feeding the repository – crystallography, mass spectrometry • Integrate to blogging tools for chemistry • Integrate to Electronic Lab Notebooks as feeds
  • 26. Model Building with Community Data • Community data as a basis of model building – Consume data from available databases, community data, new publications and build predictive algorithms for the community – How many algorithms are reported and lost? How much repeat work is done in the domain of algorithmic development?
  • 27. Recognition on Data IC50 Measurements for 62 substituted benzoxazoles ChemSpider Data Repository: DOI: 10.1356/CSID784.4
  • 28. Integrate to electronic lab notebooks
  • 29. E-Lab Notebooks • Previous work with IDBS and University of Cambridge • Working on LabTrove integration win U. Southampton • Integration between ELNs and: • ChemSpider • ChemSpider Reactions • CDS Repository • Publish data from ELNs issue DOIs • Data aggregated into fully indexed ESI format for publication
  • 30. Support for Chemical Reactions • Integrating mined reaction data from patents (Daniel Lowe) • Will also incorporate and integrate: Methods of Organic Synthesis, Catalysts and Catalyzed Reactions and…
  • 34. Inside our Publication Archive • How much data is in the archive, in the publications and in the supplementary info? – How many compounds for ChemSpider? – How many syntheses for ChemSpider reactions? – How many characterization measurements? • Property Data • Spectral Data • Graphs and charts to be used for modeling?
  • 35. What if we could capture it all? Digitally Enhancing the RSC Archive
  • 36. Start with data in publications
  • 39. Data Validation and Curation Required
  • 40. CVSP: Validation and Standardization
  • 41. Data Validation and Curation Required Encouraging Participation with Rewards and RECOGNITION
  • 42. Manual Curation • Integrated commenting, curating and validation platform across ALL eScience and publishing platforms • All integrated to a central RSC profile and feeding the AltMetrics tools
  • 44. Where we are now…
  • 45. Rewards and Recognition The First Step badge is awarded when a user submits (& has published) their 1st CSSP article. Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.
  • 46. Future Recognition in AltMetrics? ChemSpider
  • 47. Why is ChemSpider “different” • Interfaces for integration • Sharing of data – and increasingly open • Open for community participation – Deposition – Annotation – Curation • We are clear…the world is changing
  • 48. The Future Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  • 49. Acknowledgments • The RSC eScience and infrastructure teams • Our data providers, depositors, collaborators and curators • Daniel Lowe for Reaction Data • William Brouwer, Penn State • Software providers – OpenEye, ChemDoodle, ACD/Labs, GGA Software, Open Source (Jmol, JSpecView, OpenBabel)
  • 50. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams