This document discusses the motivation, technical aspects, rules, examples, and results of using tautomer generation to expand chemical structure databases for in-silico screening. It aims to create an extensible software that integrates tautomer generation into existing virtual screening workflows. The key points are:
1) Tautomeric states can impact biological interactions, but are often overlooked in virtual screening software. CACTVS was modified with 18 predefined tautomer rules and a scriptable interface to automate tautomer generation.
2) Applying the rules to several databases resulted in significant expansions, from 2.2 to 3.6 times the original size. Benchmarks showed the method could process over 500 compounds per
Advanced Tautomer Databases for in-silico Screening
1. Tautomers - Advanced Databases for
in-silico Screening
Frank Oellien, Rhein-Main-Docking Meeting 2003
2. Overview
• Motivation
• Technical Aspects
• Rules and Examples
• Results (Database Size and Benchmarks)
• Workflow
• Summary and Future Tasks
3. Motivation I
• Tautomeric states can be relevant for biological
interactions
• Brandstetter et al., MMP-8-Inhibitors
J. Biol. Chem. 276, 2001, 17405-12.
• Pospisil et al., Ligands of Herpesviral Thymidine
kinases, Helvet. Chim. Acta 85, 2002, 3237-50.
• Software for Virtual Screening or docking
adresses:
• conformations
• ionization states
• stereo centers
• tautomers
X
4. Motivation II
Tautomer Generation Applications (state of the art)
• Agent 2.0 (ETH)
• OEChem (OpenEyes)
• StereoPlex (Tripos)
• no extensions by the means of user-defined rules
• no tautomer-sensitive duplicate check
Aim: Easily extensible and scriptable software that
allows the integration and automation of tautomer
generation in our existing screening workflow.
CACTVS: Chemical data management system
5. Technical Aspects (CACTVS) I
• Flexible, modular chemical data management
system
• C core library, Tcl command layer
• Main command:
ens transform $eh $tlist <direction> <reactionmode> <flags>
<overlapmode> <excludelist> <maxtautomers> <timeout>
tlist:
Transformation definition (SMIRKS line notation,
Daylight)
[#1:1][O:2][C:3]#[N:4]>>[O:2]=[C:3]=[N:4][#1:1]
H
N
O
H
N
O
6. Technical Aspects (CACTVS) II
Pre-defined Function with 18 tautomer
transformations: make_tautoset
loop input_file {
output = make_tautoset molecule_record
loop output {
write to output_file
}
}
• combination of all tautomer transformation rules
• tautomer-sensitive duplicate check
• optional: output of most reasonable tautomer
7. Rules I
18 Pre-defined Tautomer Rules
• simple enol/keto exchange, long-range enol/keto
exchanges (including S, Se and Te analogues)
• simple imine transforms
• aromatic heteroatom H shift, long-range aromatic
H
H
heteroatom aromatic shift
N
N
O
O
• heteroatom hydrogen exchange, long-range
hetero atom hydrogen exchange (heteroatoms: N,
O, S, Se and Te)
[N,S,Se,O,Te]
[N,S,Se,O,Te]
[N,C]
[N,C]
H
[N,S,Se,O,Te]
H
[N,C]
[N,S,Se,O,Te]
[N,C]
8. Rules II
• ketene/ynol exchange (including S, Se and Te
[O,S,Se,Te]
[O,S,Se,Te]
analogues)
H
• nitro/acid transform with ionic or pentavalent nitro
group
• cyanuric acid transform
• formamidinsulfonic acid transform (including N, Se
and Te analogues)
• HCN transform
• phosphonic acid transform
11. Benchmarks
Platform: SGI Fuel R1400 / 600 MHz, 1 GB RAM
Performance depends on
• nature of the compounds
• number of tautomers
SupplierDB
Compounds/min Multiplier
Maybridge Screening
> 150
2,5
Asinex Platinum
> 250
2,9
VitasM (in-hose Stock)
> 560
3
Tripos Leadscreen
> 1400
2,2
12. Virtual Screening Workflow @ Intervet
2D / 3D
Structure DB
(MDL)
PreProcessing
Tautomer
Generation
Specific 3D
Databases
(Catalyst, Unity)
Tautomer-sensitive
Duplicate check
Data
Analysis
Virtual
Screening
13. Summary
Outcome
• flexible structure processing capabilities
• easy modifications of generator rules via scripting
• platform independence protects long-term usability and
investment
• automation and implementation in existing workflow
• Technical Limitations
• constraints for experimentally known preferences of
tautomeric states by means of simple rile-based
estimations (no energetic estimations)
• separate structure for each tautomer is needed for 3rd
party databases
14. Future Tasks
• evaluate hit-retrieval within tautomer databases
for different ligand / protein complexes
• full integration and automation of the application
into in-house virtual screening workflow
• coping with data increase
• additional sets of scripts for ionization states and
stereoisomerism