2. Types of File formats
Elsevier MDL supports a number of file formats for representation and
communication of chemical information.
Name Description
molfiles Each molfile describes a single molecular structure which can
contain disjoint fragments as salts .
SDfiles They are Structure-data files which contain data for any
number of molecules .SDfiles are the primary format for
large-scale data transfer between MDL databases.
RGfiles An RGfile describes a single molecular query with Rgroups.
Each RGfile is a combination of Ctabs defining the root
molecule and each member of each Rgroup in the query.
rxnfiles Reaction files.Eachrxnfile contains the structural information
for the reactants and products of a single reaction.
RDfiles Reaction Data File: RDfile is a more general format that can
include reactions as well as molecules.
4. Connection Table [Ctab]
A connection table (Ctab) contains information describing the structural
relationships and properties of a collection of atoms. The connection table is
fundamental to all of the MDL file formats.
9 9 0 0 0 0 0 0 0 0999 V2000 Countline
-1.0200 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5100 2.4100 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 2.3900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0000 3.2700 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0300 3.2700 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Atom Block
-0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0100 3.2800 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.0300 3.2800 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 8 1 0
2 3 2 3
3 4 1 0
4 5 2 0
4 6 1 0
6 7 2 3 Bonds Block
7 8 1 0
8 9 2 0
5. Ctab Features
Parts of Ctab Description
Counts Line Important specifications here relate to the number of
atoms, bonds, and atom lists, the chiral flag setting,
and the Ctab version.
Atom Block Specifies the atomic symbol and any mass difference,
charge, stereochemistry, and associated hydrogens for
each atom.
Bond Block Specifies the two atoms connected by the bond, the
bond type, and any bond stereochemistry and topology
(chain or ring properties) for each bond.
Properties Block Provides for future expandability of Ctab features,
while maintaining compatibility with earlier Ctab
configurations.
6. 1. Counts Line
aaabbblllfffcccsssmmmvvvvvv
where
• aaa = number of atoms (current max 255)* [Generic]
• bbb = number of bonds (current max 255)* [Generic]
• lll = number of atom lists (max 30)* [Query]
• fff = (obsolete)
• ccc = chiral flag: 0=not chiral, 1=chiral [Generic]
• sss = number of stext entries [MDL ISIS/Desktop]
• Mmm = number of lines of additional properties, including the M END line.
no longer supported, the default is set to 999.[Generic]
shows six atoms, five bonds, the CHIRAL flag on, and three lines in the
properties block:
6 5 0 0 1 0 3 V2000
Shows 9 atoms, 9 bonds, the CHIRAL flag of
9 9 0 0 0 0 0 0 0 0999 V2000
7. 2. Atom Block
The Atom Block is made up of atom lines, one line per atom with the
following format.
xxxxx.xxxxyyyyy.yyyyzzzzz.zzzzaaaddcccssshhhbbbvvvHHHrrriiimmmnnneee
Field Meaning Values
XYZ Atom coordinates
aaa atom symbol entry in periodic table or L for atom list, A, Q, * for unspecified
atom, and LP for lone pair, or R# for Rgroup label
dd Mass difference -3, -2, -1, 0, 1, 2, 3, 4 (0 if value beyond these limits)
ccc Charge 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1,
4 = doublet radical, 5 = -1, 6 = -2, 7 = -3
sss atom stereo parity 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo
center.
hhh hydrogen count + 1 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4
bbb stereo care box 0 = ignore stereo configuration of this double bond atom, 1 =
stereo configuration of double bond atom must match
vvv Valence 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero
valence.
HHH H0 designator 0 = not specified, 1 = no H atoms allowed
8. 3.Bonds block
The Bond Block is made up of bond lines, one line per bond, with the following format:
111222tttsssxxxrrrccc
Field Meaning Values
111 First atom number 1 - number of atoms
222 Second atom number 1 - number of atoms
ttt Bond type 1 = Single, 2 = Double, 3 = Triple, 4 =
Aromatic, 5 = Single or Double, 6 = Single
or Aromatic, 7 = Double or Aromatic, 8 =
Any
sss bond stereo Single bonds: 0 = not stereo, 1 = Up, 4 =
Either, 6 = Down, Double bonds: 0 = Use
x-, y-, z-coords from atom block to
determine cis or trans, 3 = Cis or trans
(either) double bond.
rrr Bond topology 0 = Either, 1 = Ring, 2 = Chain
9. Mol File
A molfile consists of a header block and a connection table. The
following shows a molfile for alanine corresponding to the following
structure:x`
Identifies the molfile: molecule name,
user's name, program, date, and other
miscellaneous information and
comments
atom 4: charge +1
atom 6: charge -1
1 entry for an isotope
atom 3: mass=13
13. RGfiles
In RGfilesLines beginning with $ define the overall structure of the Rgroup query; the
molfile header block is embedded in the Rgroup header block.In addition to the
primary connection table (Ctab block) for the root structure, a Ctab block defines each
member (*m) within each Rgroup (*r).
15. SDfile
An SDfile (structure-data file) contains the structural information and associated data items for
one or more compounds.
*l is repeated for each line of data
*d is repeated for each data item
*c is repeated for each compound
17. RXNfile
Rxnfiles contain structural data for the reactants and
products of a reaction.
where:
*r is repeated for each reactant
*p is repeated for each product
19. RDfiles
• An RD-File(reaction data file) consist of a set of edible “records”. Each record
defines a molecule or reaction, and its associated data.
• The [RDfile Header] must occur at the beginning of the physical file and
indentifies the file as an RDfile. A version stamp of 1 is given for future expansion
of the format.
• $DATM: Date/time (M/D/Y, c) stamp. This line is treated as a comment and
ignored when the program is read.
*d is repeated for each data item
*r is repeated for each reaction or molecule
21. Mol2 files from TRIPOS
Original from Tripos. Contains atom coordinates, bonds, substructure information.This
format supports partial charges and isotopes.
• Lines 1,2,3,5 and 6 are comments. They contain
the molecule name and information about the time
the molecule was created and last modified.
• Lines 8, 15, 28, and 41 in the example are Record
Type Indicator(RTIs). It is used to indicate the type
of data which follows in a .mol2 file.
• Lines 9-12, 16-27, 29-40, and 42 are all data
records
22. Parts of mol2 file
@<TRIPOS>MOLECULE
The first data line is the name of the molecule. The second data line contains the number of atoms, bonds,
substructures, features, and sets associated with the molecule. The third data line is the molecule type. The fourth data
line tells the type of charges associated with the molecule. The fifth data line contains the internal SYBYL status bits
associated with the molecule. The last data line contains any comment which may be associated with the molecule.
@<TRIPOS>ATOM
atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]]
Example :
1 CA -0.149 0.299 0.000 C.3 1 ALA1 0.000 BACKBONE|DICT|DIRECT
In the example above the atom has ID number 1. It is named CA and is located at (-0.149, 0.299, 0.000). Its atom type is C.3. It
belongs to the substructure with ID 1 which is named ALA1. The charge associated with the atom is 0.000 and the SYBYL status
bits associated with the atom are
BACKBONE, DICT, and DIRECT.
@<TRIPOS>BOND
bond_id origin_atom_id target_atom_id bond_type [status_bits]
Example : 1 1 2 ar
Example bond shows, it has ID number 1 and connects atoms 1 and 2 .It is an aromatic bond.
@<TRIPOS>SUBSTRUCTURE
subst_id subst_name root_atom [subst_type [dict_type [chain [sub_type [inter_bonds [status [comment]]]]]]]
Example: 1 BENZENE1 PERM 0 **** ****** 0 ROOT
The substructure has 1 as ID BENZENE1 as name .It is a type of PERM and associated with dictionary type 0 . The SYBYL status
bits indicate it is the ROOT substructure.