Molecular Shape Fingerprints and Fragment Analysis
1. Fragment Database Analysis Using
Molecular Shape Fingerprints
John D. MacCuish
Norah E. MacCuish,
Michael Hawrylycz, and Mitch Chapman
ACS San Francisco 2010
CINF
john.maccuish@mesaac.com
2. Outline
• Shape Fingerprints:quasi-Monte
Carlo integration approach
• 2D substructure commonality
• 3D shape and pharmacophore
features analogue
• Fragment Database Example
• Future Work
3. quasi-Monte Carlo Integration (QMC)*
• Approximate a 3D volume -- e.g., CPK
van der Waals
• In practice, quasi-randomly generated
points have best error convergence in
low dimensions.
• Align volumes using binomial sampling
= shape fingerprints, and SVD
• Fast and accurate
*”Quasi-Monte Carlo integration for the fast and effective generation of molecular shape
fingerprints”, ACS San Francisco, Wednesday 2:30 , COMP 346
5. Four subfingerprints
Find maximum
alignment to other
fingerprints of
confirmations
similarly sampled
Maximum Tanimoto
for best alignment
6. 2D Substructure Commonality
• Exploratory visualization tool on a series or a 2D
cluster of structures.
• Akin to loosening the constraint on a maximal
common substructure (MCS).
• Path based 2D fingerprint form: “Stigmata”*
• Key-based 2D fingerprint form “ChemTattoo”
*"Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets",
Shemetulskis, et al, JCIM, 36(4),1996, pp. 862-871
7. 2D Substructure Commonality
2D 768 Key-based Fingerprint
"Substructure commonality analysis and visualization with new key-based binary fingerprinter", Norah MacCuish, ACS
Chicago, CINF Session, March 24-28, 2007.
8. 2D ChemTattoo
• Generate the modal fingerprint from the input data set
• Modal fingerprint is the same length as the input data
set 2D fingerprints
• A bit is set in the Modal if that ‘key’ occurs at least in
the threshold number of input molecules
• Compare each input 2D fingerprint against the Modal
Fingerprint
• Calculate atom score (counts) for each atom for each
input structure that reflects the number of modal keys
that a given atom participates in. Color code the
scores and depict the 2D structures.
9. 2D ChemTattoo
Four 2D fingerprints
Threshold set to 1.0 -- no bit in common among all 4 fingerprints
Threshold set to 0.5 -- some bits in common among all 4 fingerprints
11. Pharmacophore
Extension
• Adding a pharmacophore extension to quasi-
Monte Carlo generated shape fingerprints.
• Substructure matching with pharmacophore
features with user defined SMARTS
• Create a ChemTattoo analogue in 3D, map
the features onto the shape fingerprint.
• Apply to 3D shape clusters, similarity
searching, etc.
12. ChemTattoo 3D
• Allow the pharmacophore features to included standard
definitions (HBond donor, HBond acceptor, etc. Or
allow a user defined set of definitions -- SMARTS based)
• Perform shape fingerprint clustering and analyze the
resulting clusters to perceive patterns (modal) in the
pharmacophore feature space
• Use a known target as the modal and query a database
to find similar shape (based on shape fingerprints) and
align the shape hits based on the pharmacophore
features of the target.
• Apply these ideas to fragments to find potential
bioisosteres for an active fragment found from a
fragment screen
13. Fragment Database
• ZINC Fragment database, ~500K
compounds
• Cluster in 2D using 768 MACCS Keys
Fingerprints
• Select the Representatives to create a 2D
diverse set
• Generate multi-conformations (< 5Kcals)
and Shape Fingerprints for all conformers
• Shape Fragment Database contains: 3,265
structures, 24,029 conformers
14. Bacterial 23S rRNA Fragment Screen
1. Generate conformers for the active fragment
2. Search Fragment Database w/ Shape FP cutoff:
0.6 - 324 conformers share the same shape
3. Identify the Pharmacophore Features in Target
(features can be user defined)
Kd > 100µM
4. Score the Fragment 3D database based on the
MS
number of modal pharmacophore features for
fragments within the shape cutoff (require at
least two pharmacophore matches)
5. Display the highlighted pharmacophore features
in the target with a surface overlay
6. Align the hits via shape -- slider bars display
shape matches that also have matching features
within the slider bar distance
Nature Reviews Drug Discovery v.3 8/04, p. 669
15. Bacterial 23S rRNA
• Pharmacophore features of Active fragment
• Shape of Active fragment w/ features
highlighted
• Hitlist of one conformation of Active
fragment with highest scoring matches
• Shapes aligned for the best shape
fingerprint score
16. Bacterial 23S rRNA
Showing hitlist...
User to move the slider
bar on the ‘red’ HBond
Acceptor and find the
shape matched
fragment that also has
a HBond Acceptor
‘close’ in distance
space to the target.
17. Future Work
• Adding in Thresholding
• More experiments in industrial
settings
• Expanding the pharmacophore
default feature definitions
• Better visualization tools