Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Non-targeted analysis supported by data
and cheminformatics delivered via the
CompTox Chemicals Dashboard
Antony Williams1...
An intro to the Dashboard
• Freely available web-based database from the
National Center for Computational Toxicology
• Pr...
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
2
875k Chemical Substances
Detailed Chemical Pages
3
Access to Chemical Hazard Data
4
Sources of Exposure to Chemicals
5
Link Access
6
Links based on chemical
identifiers to dozens of
online resources –
including analytical data
MassBank of North America
https://mona.fiehnlab.ucdavis.edu
7
“MS-ready”
structures
8
Overview of MS-Ready Structures
• All structure-based chemical substances are
algorithmically processed to
– Split multico...
10
MS-Ready Mappings Set
All substances containing component
11
Mass/Formula
Searching and
Metadata Ranking
12
Advanced Searches
Mass Search
13
Advanced Searches
Mass Search
14
MS-Ready Structures for
Formula Search
15
MS-Ready Mappings
• EXACT Formula: C10H16N2O8: 3 Hits
16
MS-Ready Mappings
• Same Input Formula: C10H16N2O8
• MS Ready Formula Search: 125 Chemicals
17
Candidate ranking
using metadata
18
Data Source Ranking of
“known unknowns”
19
• A mass and/or formula search is
for an unknown chemical but it
is a known che...
Dashboard Metadata for Ranking
• Chosen dashboard metadata to rank candidates
– Associated data sources
• Lists in the und...
Comparing Search Performance
21
• When dashboard contained 720k chemicals
• Only 3% of ChemSpider size
• What was the comp...
SAME dataset for comparison
22
How did performance compare?
23
For the same 162 chemicals, Dashboard
outperforms ChemSpider for both Mass
and Formula Ran...
Data Quality is important
• Data quality in free web-based databases!
24
Will the correct Microcystin LR Stand Up?
ChemSpider Skeleton Search
25
Comparing ChemSpider Structures
26
Batch Searching
mass and formula
27
Batch Searching
• Singleton searches are useful but we work
with thousands of masses and formulae!
• Typical questions
– W...
Batch Searching Formula/Mass
29
Searching batches using MS-Ready
Formula (or mass) searching
30
Chemical Lists
31
Chemical Lists
32
EPAHFR: Hydraulic Fracturing
33
PFAS lists of Chemicals
34
Research in
Progress
35
Predicted Mass Spectra
http://cfmid.wishartlab.com/
• MS/MS spectra prediction for ESI+, ESI-, and EI
• Predictions genera...
Search Expt. vs. Predicted Spectra
Search Expt. vs. Predicted Spectra
Spectral Viewer Comparison
39
Prototype Development
40
Conclusion
• Dashboard access to data for ~875,000 chemicals
• MS-Ready data facilitates structure identification
• Relate...
Acknowledgements
• IT Development team – especially Jeff
Edwards and Jeremy Dunne
• Chris Grulke for the ChemReg system
• ...
Contact
Antony Williams
US EPA Office of Research and Development
Center for Computational Toxicology and Exposure
EMAIL: ...
Próxima SlideShare
Cargando en…5
×

Non-targeted analysis supported by data and cheminformatics delivered via the US-EPA CompTox Chemicals Dashboard

114 visualizaciones

Publicado el

Non-targeted analysis (NTA) uses high-resolution mass spectrometry to better understand the identity of a wide variety of chemicals present in environmental samples (and other matrices). However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. Analysis of the resultant mass spectrometry information relies on cheminformatics to identify and rank chemicals and the US EPA has developed functionality within the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) to address challenges related to this analysis. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will review how the CompTox Chemicals Dashboard via its flexible search capabilities, rich data for ~875,000 chemical substances, and visualization approaches within this open chemistry resource provides a freely available software tool to support structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Publicado en: Ciencias
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Non-targeted analysis supported by data and cheminformatics delivered via the US-EPA CompTox Chemicals Dashboard

  1. 1. Non-targeted analysis supported by data and cheminformatics delivered via the CompTox Chemicals Dashboard Antony Williams1, Alex Chao2, Tom Transue3, Tommy Cathey3, Elin Ulrich1 and Jon Sobus1 1) Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, RTP, NC 2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC 3) GDIT, Research Triangle Park, North Carolina, United State November 2019 SETAC, Toronto, Canada http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. An intro to the Dashboard • Freely available web-based database from the National Center for Computational Toxicology • Providing data for 875,000 substances including – Experimental and predicted physicochemical properties – In vivo toxicity data harvested from dozens of public resources – In vitro bioactivity data for thousands of chemicals and assays – Exposure data including chemicals in consumer products – Real time predictions for >20 physchem and toxicological endpoints • Dashboard is used by mass spectrometrists for chemical identification • A quick view of general capabilities… 1
  3. 3. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 2 875k Chemical Substances
  4. 4. Detailed Chemical Pages 3
  5. 5. Access to Chemical Hazard Data 4
  6. 6. Sources of Exposure to Chemicals 5
  7. 7. Link Access 6 Links based on chemical identifiers to dozens of online resources – including analytical data
  8. 8. MassBank of North America https://mona.fiehnlab.ucdavis.edu 7
  9. 9. “MS-ready” structures 8
  10. 10. Overview of MS-Ready Structures • All structure-based chemical substances are algorithmically processed to – Split multicomponent chemicals into individual structures – Desalt and neutralize individual structures – Remove stereochemical bonds from all chemicals 9
  11. 11. 10
  12. 12. MS-Ready Mappings Set All substances containing component 11
  13. 13. Mass/Formula Searching and Metadata Ranking 12
  14. 14. Advanced Searches Mass Search 13
  15. 15. Advanced Searches Mass Search 14
  16. 16. MS-Ready Structures for Formula Search 15
  17. 17. MS-Ready Mappings • EXACT Formula: C10H16N2O8: 3 Hits 16
  18. 18. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 17
  19. 19. Candidate ranking using metadata 18
  20. 20. Data Source Ranking of “known unknowns” 19 • A mass and/or formula search is for an unknown chemical but it is a known chemical contained within a reference database • Most likely candidate chemicals have the most associated data sources, most associated literature articles or both C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  21. 21. Dashboard Metadata for Ranking • Chosen dashboard metadata to rank candidates – Associated data sources • Lists in the underlying database (more about lists later) • Associated data sources in PubChem • Specific source types (e.g. water, surfactants, pesticides) – Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is an important source of data (from CPDat database) 20
  22. 22. Comparing Search Performance 21 • When dashboard contained 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance?
  23. 23. SAME dataset for comparison 22
  24. 24. How did performance compare? 23 For the same 162 chemicals, Dashboard outperforms ChemSpider for both Mass and Formula Ranking
  25. 25. Data Quality is important • Data quality in free web-based databases! 24
  26. 26. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 25
  27. 27. Comparing ChemSpider Structures 26
  28. 28. Batch Searching mass and formula 27
  29. 29. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 28
  30. 30. Batch Searching Formula/Mass 29
  31. 31. Searching batches using MS-Ready Formula (or mass) searching 30
  32. 32. Chemical Lists 31
  33. 33. Chemical Lists 32
  34. 34. EPAHFR: Hydraulic Fracturing 33
  35. 35. PFAS lists of Chemicals 34
  36. 36. Research in Progress 35
  37. 37. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 36
  38. 38. Search Expt. vs. Predicted Spectra
  39. 39. Search Expt. vs. Predicted Spectra
  40. 40. Spectral Viewer Comparison 39
  41. 41. Prototype Development 40
  42. 42. Conclusion • Dashboard access to data for ~875,000 chemicals • MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking 41 • Relationship mappings and chemical lists of great utility • Dashboard and contents are one part of the solution • New developments in progress, especially API development, will be very enabling…
  43. 43. Acknowledgements • IT Development team – especially Jeff Edwards and Jeremy Dunne • Chris Grulke for the ChemReg system • Andrew McEachran (now at Agilent) • The curation team focused on data quality 42
  44. 44. Contact Antony Williams US EPA Office of Research and Development Center for Computational Toxicology and Exposure EMAIL: Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 43

×