Chemical ontologies represent abstractions of chemical compounds - providing structural as well as functional and chemical property classifications. With automated patent text processing there is also an increasing interest to automatically classify chemical compounds in patent documents to enable chemical searches based on known chemical classes.
Thus, we will present strategies to automatically classify chemical compounds based on their names and chemical structure or function using a chemical ontology derived from the pure lexical variants MeSH and ChEBI but incorporating SMARTS and chemical calculation based logic. We will describe the development of this ontology - comprising also functional classifications and material science terms such as alloys and polymers.
Using our UIMA based OCMiner annotation pipeline, over 90 million patent full text documents were extracted to find mentions of chemical compounds, substances, chemical classes and chemical groups. In addition, the claimed uses of these compounds were also extracted. Subsequently, chemical terms were classified by our chemical ontology, transforming more than 10 billion found chemical class mentions into an ontology enabled, Lucene based search index. This index was also used to analyze the frequency of found chemical classes per time period, giving indications on the focus of general chemical reseach activities and recent trends in patenting strategies.
An annotated data set of 10 years US patents is freely available for further investigations and can be used to train and develop further the use, quality and interchangeability of chemical ontologies.