Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure Scientist at National Center of Computational Toxicology at EPA
Innovative Research for a Sustainable Future
www.epa.gov/research
Integrating an Analytical Methods and Mass Spectral Database with
Cheminformatics Capabilities
Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3
1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA
3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
`
ACS West
San Francisco, CA
August 13-17, 2023
There are three kinds of data contained within the database.
- Fact sheets are results-oriented documents with data associated
with one or more substances including basic descriptions of health
effects to monographs with NMR, Raman, and IR spectra.
- Methods document an end-to-end analytical procedure for one or
more substances, sometimes 100s of chemicals. The documents
are curated to extract the chemical compounds and then
annotated with information such as matrix and methodologies.
- Spectra, in the form of lists of m/z-intensity pairs and parameters.
In addition to the above information, records have assorted
metadata stored in the database. These data include information
such as experimental conditions, authors, a synopsis for the method
or fact sheet, and other data depending on what kind of record it is.
Data are open access and are derived from a variety of sources.
These include online spectral databases, vendor methods, research
groups, EPA databases and other government agencies.
At the time of writing the database contains approximately:
- 165,000 spectra (plus 600,000 externally linked spectra)
- >700 fact sheets
- >3300 methods
General Searching
Data
Spectrum Search
Description
A large variety of sources for spectra, documented analytical
procedures and methods, and other associated documentation exist
and are, in theory, easily available with the usual web search.
However, these sources are largely isolated from each other, not
easy to find via general searches because of inconsistencies in
chemical names and identifiers and then are highly varied in format.
To address these challenges, the Analytical Methods and Open
Spectra (AMOS) web application has been developed. AMOS is a
database and associated web-based application containing several
types of records searchable by common identifiers known to
chemists (i.e., CASRNs, InChI Keys and chemical names).
The authors thank the data curation team for their rigorous work in
annotating and identifying information in the records. Chemical data
extraction, curation and annotation is an essential part of this work.
Primary search functionality
searches all records for a
single chemical substance.
One half of the page (Fig.1)
shows the searched
compound (assuming a
match) and yields a table of
records containing that
substance, the data source,
associated methodology, and
a short description of the
record itself.
Selecting a row in that table
allows for viewing the
contents of that record more
closely, whether opening an
analytical method or
displaying a spectrum.
For spectral data, an
additional search option is
available. If a mass range,
methodology, and spectrum
(as x,y pairs) are supplied,
matching spectra with that
mass and methodology,
ranked by their similarity to
the user-supplied spectrum
will be returned. See Fig. 2.
The top table lists the
associated substance for
the found spectrum (with
associated DTXSID), the
similarity of that spectrum,
and a description of that
spectrum. Below that table
is an interactive plot of the
overlap of the two spectra.
Method Searches
AMOS contains two functions for searching for methods. One is a simple
table that lists all methods in the database (not pictured). This list can be
filtered by several fields including matrix, analyte, and method name,
allowing for quick discovery of methods that cover a known topic.
The other, shown below, is a search for methods containing similar
substances, thereby providing a starting point even for chemicals without
methods. A substance is searched for and if methods exist they are
returned. If there are no existing methods for that chemical then AMOS
returns all methods which contain at least one substance with a
sufficiently high Tanimoto structural similarity coefficient. This can be
especially useful in cases where a substance does not have any methods
associated with it at all – in the example below (see Fig. 3), the drug was
only available starting in 2015, so there has been relatively little time to
develop and publish methods for it.
Acknowledgements
Disclaimers
This tool is currently internal to the US- EPA and still under development.
Plans to release this to the public have not been finalized, but the process
is hoped to be complete by early 2024.The data used in this application
have not been thoroughly reviewed by the EPA and the user needs to
exercise judgement in their use of the results.
The views expressed in this poster are those of the authors and do not
necessarily reflect the views or policies of the U.S. EPA
Figure 1: The list of methods and
LC-MS or GC-MS spectra
associated with perfluorooctane-
sulfonic acid (PFOS).
Figure 2: A spectral similarity search
result includes the similarity match for
spectra and the list of associated
chemical compounds.
Figure 3: A search for a chemical with no matching methods then
provides the associated structure to a Tanimoto structural similarity
search to return methods with similar structures contained in them.
1 de 1

Más contenido relacionado

Similar a Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities(20)

Using Cheminformatics Approaches to Develop a Structure Searchable Database o...Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure 250 vistas
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure 180 vistas
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure 83 vistas
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure 5.4K vistas
The Future of Computational Models for Predicting Human ToxicitiesThe Future of Computational Models for Predicting Human Toxicities
The Future of Computational Models for Predicting Human Toxicities
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure 643 vistas
Hdat pdf-draftHdat pdf-draft
Hdat pdf-draft
shassant2343 vistas
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
Databricks849 vistas
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure 310 vistas
Pallavi guptaPallavi gupta
Pallavi gupta
PallaviGupta22020 vistas
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
Peter Embi572 vistas

Último(20)

plasmidsplasmids
plasmids
scribddarkened3527 vistas
Matthias Beller ChemAI 231116.pptxMatthias Beller ChemAI 231116.pptx
Matthias Beller ChemAI 231116.pptx
Marco Tibaldi79 vistas
Isolating mechanism.pptxIsolating mechanism.pptx
Isolating mechanism.pptx
JagadishaTV23 vistas
SANJAY HPLC.pptxSANJAY HPLC.pptx
SANJAY HPLC.pptx
sanjayudps2016115 vistas
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH11 vistas
Radioactivity.pptxRadioactivity.pptx
Radioactivity.pptx
Rachana Choudhary5 vistas
Water-bath Water-bath
Water-bath
zolajoneslabtronuk8 vistas
Astringent.pptxAstringent.pptx
Astringent.pptx
muleymegha88 vistas
Max Welling ChemAI 231116.pptxMax Welling ChemAI 231116.pptx
Max Welling ChemAI 231116.pptx
Marco Tibaldi128 vistas
miscellaneous compound.pdfmiscellaneous compound.pdf
miscellaneous compound.pdf
manjusha kareppa14 vistas
Climate Change.pptxClimate Change.pptx
Climate Change.pptx
laurenmortensen191 vistas
Batrachospermum.pptxBatrachospermum.pptx
Batrachospermum.pptx
nisarahmad63231614 vistas

Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

  • 1. Innovative Research for a Sustainable Future www.epa.gov/research Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3 1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA 2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA 3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA ` ACS West San Francisco, CA August 13-17, 2023 There are three kinds of data contained within the database. - Fact sheets are results-oriented documents with data associated with one or more substances including basic descriptions of health effects to monographs with NMR, Raman, and IR spectra. - Methods document an end-to-end analytical procedure for one or more substances, sometimes 100s of chemicals. The documents are curated to extract the chemical compounds and then annotated with information such as matrix and methodologies. - Spectra, in the form of lists of m/z-intensity pairs and parameters. In addition to the above information, records have assorted metadata stored in the database. These data include information such as experimental conditions, authors, a synopsis for the method or fact sheet, and other data depending on what kind of record it is. Data are open access and are derived from a variety of sources. These include online spectral databases, vendor methods, research groups, EPA databases and other government agencies. At the time of writing the database contains approximately: - 165,000 spectra (plus 600,000 externally linked spectra) - >700 fact sheets - >3300 methods General Searching Data Spectrum Search Description A large variety of sources for spectra, documented analytical procedures and methods, and other associated documentation exist and are, in theory, easily available with the usual web search. However, these sources are largely isolated from each other, not easy to find via general searches because of inconsistencies in chemical names and identifiers and then are highly varied in format. To address these challenges, the Analytical Methods and Open Spectra (AMOS) web application has been developed. AMOS is a database and associated web-based application containing several types of records searchable by common identifiers known to chemists (i.e., CASRNs, InChI Keys and chemical names). The authors thank the data curation team for their rigorous work in annotating and identifying information in the records. Chemical data extraction, curation and annotation is an essential part of this work. Primary search functionality searches all records for a single chemical substance. One half of the page (Fig.1) shows the searched compound (assuming a match) and yields a table of records containing that substance, the data source, associated methodology, and a short description of the record itself. Selecting a row in that table allows for viewing the contents of that record more closely, whether opening an analytical method or displaying a spectrum. For spectral data, an additional search option is available. If a mass range, methodology, and spectrum (as x,y pairs) are supplied, matching spectra with that mass and methodology, ranked by their similarity to the user-supplied spectrum will be returned. See Fig. 2. The top table lists the associated substance for the found spectrum (with associated DTXSID), the similarity of that spectrum, and a description of that spectrum. Below that table is an interactive plot of the overlap of the two spectra. Method Searches AMOS contains two functions for searching for methods. One is a simple table that lists all methods in the database (not pictured). This list can be filtered by several fields including matrix, analyte, and method name, allowing for quick discovery of methods that cover a known topic. The other, shown below, is a search for methods containing similar substances, thereby providing a starting point even for chemicals without methods. A substance is searched for and if methods exist they are returned. If there are no existing methods for that chemical then AMOS returns all methods which contain at least one substance with a sufficiently high Tanimoto structural similarity coefficient. This can be especially useful in cases where a substance does not have any methods associated with it at all – in the example below (see Fig. 3), the drug was only available starting in 2015, so there has been relatively little time to develop and publish methods for it. Acknowledgements Disclaimers This tool is currently internal to the US- EPA and still under development. Plans to release this to the public have not been finalized, but the process is hoped to be complete by early 2024.The data used in this application have not been thoroughly reviewed by the EPA and the user needs to exercise judgement in their use of the results. The views expressed in this poster are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA Figure 1: The list of methods and LC-MS or GC-MS spectra associated with perfluorooctane- sulfonic acid (PFOS). Figure 2: A spectral similarity search result includes the similarity match for spectra and the list of associated chemical compounds. Figure 3: A search for a chemical with no matching methods then provides the associated structure to a Tanimoto structural similarity search to return methods with similar structures contained in them.