SlideShare una empresa de Scribd logo
1 de 23
Molecular File Formats
Types of File formats
Elsevier MDL supports a number of file formats for representation and
communication of chemical information.
Name Description
molfiles Each molfile describes a single molecular structure which can
contain disjoint fragments as salts .
SDfiles They are Structure-data files which contain data for any
number of molecules .SDfiles are the primary format for
large-scale data transfer between MDL databases.
RGfiles An RGfile describes a single molecular query with Rgroups.
Each RGfile is a combination of Ctabs defining the root
molecule and each member of each Rgroup in the query.
rxnfiles Reaction files.Eachrxnfile contains the structural information
for the reactants and products of a single reaction.
RDfiles Reaction Data File: RDfile is a more general format that can
include reactions as well as molecules.
File Formats
http://c4.cabrillo.edu/404/ctfile.pdf
Connection Table [Ctab]
A connection table (Ctab) contains information describing the structural
relationships and properties of a collection of atoms. The connection table is
fundamental to all of the MDL file formats.
9 9 0 0 0 0 0 0 0 0999 V2000 Countline
-1.0200 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5100 2.4100 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 2.3900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0000 3.2700 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0300 3.2700 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Atom Block
-0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0100 3.2800 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.0300 3.2800 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 8 1 0
2 3 2 3
3 4 1 0
4 5 2 0
4 6 1 0
6 7 2 3 Bonds Block
7 8 1 0
8 9 2 0
Ctab Features
Parts of Ctab Description
Counts Line Important specifications here relate to the number of
atoms, bonds, and atom lists, the chiral flag setting,
and the Ctab version.
Atom Block Specifies the atomic symbol and any mass difference,
charge, stereochemistry, and associated hydrogens for
each atom.
Bond Block Specifies the two atoms connected by the bond, the
bond type, and any bond stereochemistry and topology
(chain or ring properties) for each bond.
Properties Block Provides for future expandability of Ctab features,
while maintaining compatibility with earlier Ctab
configurations.
1. Counts Line
aaabbblllfffcccsssmmmvvvvvv
where
• aaa = number of atoms (current max 255)* [Generic]
• bbb = number of bonds (current max 255)* [Generic]
• lll = number of atom lists (max 30)* [Query]
• fff = (obsolete)
• ccc = chiral flag: 0=not chiral, 1=chiral [Generic]
• sss = number of stext entries [MDL ISIS/Desktop]
• Mmm = number of lines of additional properties, including the M END line.
no longer supported, the default is set to 999.[Generic]
shows six atoms, five bonds, the CHIRAL flag on, and three lines in the
properties block:
6 5 0 0 1 0 3 V2000
Shows 9 atoms, 9 bonds, the CHIRAL flag of
9 9 0 0 0 0 0 0 0 0999 V2000
2. Atom Block
The Atom Block is made up of atom lines, one line per atom with the
following format.
xxxxx.xxxxyyyyy.yyyyzzzzz.zzzzaaaddcccssshhhbbbvvvHHHrrriiimmmnnneee
Field Meaning Values
XYZ Atom coordinates
aaa atom symbol entry in periodic table or L for atom list, A, Q, * for unspecified
atom, and LP for lone pair, or R# for Rgroup label
dd Mass difference -3, -2, -1, 0, 1, 2, 3, 4 (0 if value beyond these limits)
ccc Charge 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1,
4 = doublet radical, 5 = -1, 6 = -2, 7 = -3
sss atom stereo parity 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo
center.
hhh hydrogen count + 1 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4
bbb stereo care box 0 = ignore stereo configuration of this double bond atom, 1 =
stereo configuration of double bond atom must match
vvv Valence 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero
valence.
HHH H0 designator 0 = not specified, 1 = no H atoms allowed
3.Bonds block
The Bond Block is made up of bond lines, one line per bond, with the following format:
111222tttsssxxxrrrccc
Field Meaning Values
111 First atom number 1 - number of atoms
222 Second atom number 1 - number of atoms
ttt Bond type 1 = Single, 2 = Double, 3 = Triple, 4 =
Aromatic, 5 = Single or Double, 6 = Single
or Aromatic, 7 = Double or Aromatic, 8 =
Any
sss bond stereo Single bonds: 0 = not stereo, 1 = Up, 4 =
Either, 6 = Down, Double bonds: 0 = Use
x-, y-, z-coords from atom block to
determine cis or trans, 3 = Cis or trans
(either) double bond.
rrr Bond topology 0 = Either, 1 = Ring, 2 = Chain
Mol File
A molfile consists of a header block and a connection table. The
following shows a molfile for alanine corresponding to the following
structure:x`
Identifies the molfile: molecule name,
user's name, program, date, and other
miscellaneous information and
comments
atom 4: charge +1
atom 6: charge -1
1 entry for an isotope
atom 3: mass=13
Representation of Stereochemistry
What is Stereochemistry ?
http://www.chemhelper.com/enantiomers.html
Representationof Stereochemistry: Atom Block
Representationof Stereochemistry: Bond Block
1= Shows stereo bond up
RGfiles
In RGfilesLines beginning with $ define the overall structure of the Rgroup query; the
molfile header block is embedded in the Rgroup header block.In addition to the
primary connection table (Ctab block) for the root structure, a Ctab block defines each
member (*m) within each Rgroup (*r).
Example of RGfile
SDfile
An SDfile (structure-data file) contains the structural information and associated data items for
one or more compounds.
*l is repeated for each line of data
*d is repeated for each data item
*c is repeated for each compound
Example of SDfile
RXNfile
Rxnfiles contain structural data for the reactants and
products of a reaction.
where:
*r is repeated for each reactant
*p is repeated for each product
RXNfile example
RDfiles
• An RD-File(reaction data file) consist of a set of edible “records”. Each record
defines a molecule or reaction, and its associated data.
• The [RDfile Header] must occur at the beginning of the physical file and
indentifies the file as an RDfile. A version stamp of 1 is given for future expansion
of the format.
• $DATM: Date/time (M/D/Y, c) stamp. This line is treated as a comment and
ignored when the program is read.
*d is repeated for each data item
*r is repeated for each reaction or molecule
RDfile example
Mol2 files from TRIPOS
Original from Tripos. Contains atom coordinates, bonds, substructure information.This
format supports partial charges and isotopes.
• Lines 1,2,3,5 and 6 are comments. They contain
the molecule name and information about the time
the molecule was created and last modified.
• Lines 8, 15, 28, and 41 in the example are Record
Type Indicator(RTIs). It is used to indicate the type
of data which follows in a .mol2 file.
• Lines 9-12, 16-27, 29-40, and 42 are all data
records
Parts of mol2 file
@<TRIPOS>MOLECULE
The first data line is the name of the molecule. The second data line contains the number of atoms, bonds,
substructures, features, and sets associated with the molecule. The third data line is the molecule type. The fourth data
line tells the type of charges associated with the molecule. The fifth data line contains the internal SYBYL status bits
associated with the molecule. The last data line contains any comment which may be associated with the molecule.
@<TRIPOS>ATOM
atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]]
Example :
1 CA -0.149 0.299 0.000 C.3 1 ALA1 0.000 BACKBONE|DICT|DIRECT
In the example above the atom has ID number 1. It is named CA and is located at (-0.149, 0.299, 0.000). Its atom type is C.3. It
belongs to the substructure with ID 1 which is named ALA1. The charge associated with the atom is 0.000 and the SYBYL status
bits associated with the atom are
BACKBONE, DICT, and DIRECT.
@<TRIPOS>BOND
bond_id origin_atom_id target_atom_id bond_type [status_bits]
Example : 1 1 2 ar
Example bond shows, it has ID number 1 and connects atoms 1 and 2 .It is an aromatic bond.
@<TRIPOS>SUBSTRUCTURE
subst_id subst_name root_atom [subst_type [dict_type [chain [sub_type [inter_bonds [status [comment]]]]]]]
Example: 1 BENZENE1 PERM 0 **** ****** 0 ROOT
The substructure has 1 as ID BENZENE1 as name .It is a type of PERM and associated with dictionary type 0 . The SYBYL status
bits indicate it is the ROOT substructure.
References
• http://www.tripos.com/data/support/mol2.pdf
• http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php
• Description of Several Chemical Structure File Formats Used by Computer Programs
Developed at Molecular Design Limited. Arthur Dalby etal. J. Chem. Inf Comput. Sci.
1992, 32, 244-255.
• http://www.chem.ucla.edu/harding/tutorials/stereochem/rsez.pdf
• http://www.chem.ucla.edu/harding/notes/notes_14C_stereo03.pdf

Más contenido relacionado

La actualidad más candente

Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
Pinky Vincent
 
Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
SHRADHEYA GUPTA
 
protein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modellingprotein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modelling
Dileep Paruchuru
 

La actualidad más candente (20)

Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular Modeling
 
Molecular modelling
Molecular modelling Molecular modelling
Molecular modelling
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Molecular modeling in drug design
Molecular modeling in drug designMolecular modeling in drug design
Molecular modeling in drug design
 
Descriptors
DescriptorsDescriptors
Descriptors
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Molecular maodeling and drug design
Molecular maodeling and drug designMolecular maodeling and drug design
Molecular maodeling and drug design
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Molecular Modeling
Molecular ModelingMolecular Modeling
Molecular Modeling
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
2D - QSAR
2D - QSAR2D - QSAR
2D - QSAR
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
protein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modellingprotein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modelling
 
Docking Score Functions
Docking Score FunctionsDocking Score Functions
Docking Score Functions
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Pharmacophore identification
Pharmacophore identificationPharmacophore identification
Pharmacophore identification
 

Destacado (13)

Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Design your own test automation tool
Design your own test automation toolDesign your own test automation tool
Design your own test automation tool
 
molecular file formats in bioinformatics
molecular file formats in bioinformaticsmolecular file formats in bioinformatics
molecular file formats in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Similar a Chemical File Formats for storing chemical data

Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
BITS
 
2.Electronic Structure
2.Electronic  Structure2.Electronic  Structure
2.Electronic Structure
Alan Crooks
 
Cmc chapter 08
Cmc chapter 08Cmc chapter 08
Cmc chapter 08
Jane Hamze
 
Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724
Robin Gutell
 

Similar a Chemical File Formats for storing chemical data (20)

Oct 2011 ualr
Oct 2011 ualrOct 2011 ualr
Oct 2011 ualr
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
ch3
ch3ch3
ch3
 
Md simulations modified
Md simulations modifiedMd simulations modified
Md simulations modified
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPS
 
2.Electronic Structure
2.Electronic  Structure2.Electronic  Structure
2.Electronic Structure
 
Basic execution
Basic executionBasic execution
Basic execution
 
Cmc chapter 08
Cmc chapter 08Cmc chapter 08
Cmc chapter 08
 
class8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdfclass8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdf
 
RDKit Gems
RDKit GemsRDKit Gems
RDKit Gems
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724
 
Report on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task ForceReport on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task Force
 
SQL
SQLSQL
SQL
 
Oracle sql tutorial
Oracle sql tutorialOracle sql tutorial
Oracle sql tutorial
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
LIBRARY_information.pdf
LIBRARY_information.pdfLIBRARY_information.pdf
LIBRARY_information.pdf
 
DBMS Unit-2.pdf
DBMS Unit-2.pdfDBMS Unit-2.pdf
DBMS Unit-2.pdf
 
Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive
 
Soap win
Soap winSoap win
Soap win
 

Más de Abhik Seal

Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
Abhik Seal
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
Abhik Seal
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
Abhik Seal
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
Abhik Seal
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
Abhik Seal
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
Abhik Seal
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
Abhik Seal
 

Más de Abhik Seal (20)

Chemical data
Chemical dataChemical data
Chemical data
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Networks
NetworksNetworks
Networks
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
 
Poster
PosterPoster
Poster
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
 
Weka guide
Weka guideWeka guide
Weka guide
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 
Document1
Document1Document1
Document1
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Chemical File Formats for storing chemical data

  • 2. Types of File formats Elsevier MDL supports a number of file formats for representation and communication of chemical information. Name Description molfiles Each molfile describes a single molecular structure which can contain disjoint fragments as salts . SDfiles They are Structure-data files which contain data for any number of molecules .SDfiles are the primary format for large-scale data transfer between MDL databases. RGfiles An RGfile describes a single molecular query with Rgroups. Each RGfile is a combination of Ctabs defining the root molecule and each member of each Rgroup in the query. rxnfiles Reaction files.Eachrxnfile contains the structural information for the reactants and products of a single reaction. RDfiles Reaction Data File: RDfile is a more general format that can include reactions as well as molecules.
  • 4. Connection Table [Ctab] A connection table (Ctab) contains information describing the structural relationships and properties of a collection of atoms. The connection table is fundamental to all of the MDL file formats. 9 9 0 0 0 0 0 0 0 0999 V2000 Countline -1.0200 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.5100 2.4100 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5000 2.3900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.0000 3.2700 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 3.2700 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Atom Block -0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0100 3.2800 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.0300 3.2800 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 8 1 0 2 3 2 3 3 4 1 0 4 5 2 0 4 6 1 0 6 7 2 3 Bonds Block 7 8 1 0 8 9 2 0
  • 5. Ctab Features Parts of Ctab Description Counts Line Important specifications here relate to the number of atoms, bonds, and atom lists, the chiral flag setting, and the Ctab version. Atom Block Specifies the atomic symbol and any mass difference, charge, stereochemistry, and associated hydrogens for each atom. Bond Block Specifies the two atoms connected by the bond, the bond type, and any bond stereochemistry and topology (chain or ring properties) for each bond. Properties Block Provides for future expandability of Ctab features, while maintaining compatibility with earlier Ctab configurations.
  • 6. 1. Counts Line aaabbblllfffcccsssmmmvvvvvv where • aaa = number of atoms (current max 255)* [Generic] • bbb = number of bonds (current max 255)* [Generic] • lll = number of atom lists (max 30)* [Query] • fff = (obsolete) • ccc = chiral flag: 0=not chiral, 1=chiral [Generic] • sss = number of stext entries [MDL ISIS/Desktop] • Mmm = number of lines of additional properties, including the M END line. no longer supported, the default is set to 999.[Generic] shows six atoms, five bonds, the CHIRAL flag on, and three lines in the properties block: 6 5 0 0 1 0 3 V2000 Shows 9 atoms, 9 bonds, the CHIRAL flag of 9 9 0 0 0 0 0 0 0 0999 V2000
  • 7. 2. Atom Block The Atom Block is made up of atom lines, one line per atom with the following format. xxxxx.xxxxyyyyy.yyyyzzzzz.zzzzaaaddcccssshhhbbbvvvHHHrrriiimmmnnneee Field Meaning Values XYZ Atom coordinates aaa atom symbol entry in periodic table or L for atom list, A, Q, * for unspecified atom, and LP for lone pair, or R# for Rgroup label dd Mass difference -3, -2, -1, 0, 1, 2, 3, 4 (0 if value beyond these limits) ccc Charge 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1, 4 = doublet radical, 5 = -1, 6 = -2, 7 = -3 sss atom stereo parity 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo center. hhh hydrogen count + 1 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4 bbb stereo care box 0 = ignore stereo configuration of this double bond atom, 1 = stereo configuration of double bond atom must match vvv Valence 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero valence. HHH H0 designator 0 = not specified, 1 = no H atoms allowed
  • 8. 3.Bonds block The Bond Block is made up of bond lines, one line per bond, with the following format: 111222tttsssxxxrrrccc Field Meaning Values 111 First atom number 1 - number of atoms 222 Second atom number 1 - number of atoms ttt Bond type 1 = Single, 2 = Double, 3 = Triple, 4 = Aromatic, 5 = Single or Double, 6 = Single or Aromatic, 7 = Double or Aromatic, 8 = Any sss bond stereo Single bonds: 0 = not stereo, 1 = Up, 4 = Either, 6 = Down, Double bonds: 0 = Use x-, y-, z-coords from atom block to determine cis or trans, 3 = Cis or trans (either) double bond. rrr Bond topology 0 = Either, 1 = Ring, 2 = Chain
  • 9. Mol File A molfile consists of a header block and a connection table. The following shows a molfile for alanine corresponding to the following structure:x` Identifies the molfile: molecule name, user's name, program, date, and other miscellaneous information and comments atom 4: charge +1 atom 6: charge -1 1 entry for an isotope atom 3: mass=13
  • 10. Representation of Stereochemistry What is Stereochemistry ? http://www.chemhelper.com/enantiomers.html
  • 12. Representationof Stereochemistry: Bond Block 1= Shows stereo bond up
  • 13. RGfiles In RGfilesLines beginning with $ define the overall structure of the Rgroup query; the molfile header block is embedded in the Rgroup header block.In addition to the primary connection table (Ctab block) for the root structure, a Ctab block defines each member (*m) within each Rgroup (*r).
  • 15. SDfile An SDfile (structure-data file) contains the structural information and associated data items for one or more compounds. *l is repeated for each line of data *d is repeated for each data item *c is repeated for each compound
  • 17. RXNfile Rxnfiles contain structural data for the reactants and products of a reaction. where: *r is repeated for each reactant *p is repeated for each product
  • 19. RDfiles • An RD-File(reaction data file) consist of a set of edible “records”. Each record defines a molecule or reaction, and its associated data. • The [RDfile Header] must occur at the beginning of the physical file and indentifies the file as an RDfile. A version stamp of 1 is given for future expansion of the format. • $DATM: Date/time (M/D/Y, c) stamp. This line is treated as a comment and ignored when the program is read. *d is repeated for each data item *r is repeated for each reaction or molecule
  • 21. Mol2 files from TRIPOS Original from Tripos. Contains atom coordinates, bonds, substructure information.This format supports partial charges and isotopes. • Lines 1,2,3,5 and 6 are comments. They contain the molecule name and information about the time the molecule was created and last modified. • Lines 8, 15, 28, and 41 in the example are Record Type Indicator(RTIs). It is used to indicate the type of data which follows in a .mol2 file. • Lines 9-12, 16-27, 29-40, and 42 are all data records
  • 22. Parts of mol2 file @<TRIPOS>MOLECULE The first data line is the name of the molecule. The second data line contains the number of atoms, bonds, substructures, features, and sets associated with the molecule. The third data line is the molecule type. The fourth data line tells the type of charges associated with the molecule. The fifth data line contains the internal SYBYL status bits associated with the molecule. The last data line contains any comment which may be associated with the molecule. @<TRIPOS>ATOM atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]] Example : 1 CA -0.149 0.299 0.000 C.3 1 ALA1 0.000 BACKBONE|DICT|DIRECT In the example above the atom has ID number 1. It is named CA and is located at (-0.149, 0.299, 0.000). Its atom type is C.3. It belongs to the substructure with ID 1 which is named ALA1. The charge associated with the atom is 0.000 and the SYBYL status bits associated with the atom are BACKBONE, DICT, and DIRECT. @<TRIPOS>BOND bond_id origin_atom_id target_atom_id bond_type [status_bits] Example : 1 1 2 ar Example bond shows, it has ID number 1 and connects atoms 1 and 2 .It is an aromatic bond. @<TRIPOS>SUBSTRUCTURE subst_id subst_name root_atom [subst_type [dict_type [chain [sub_type [inter_bonds [status [comment]]]]]]] Example: 1 BENZENE1 PERM 0 **** ****** 0 ROOT The substructure has 1 as ID BENZENE1 as name .It is a type of PERM and associated with dictionary type 0 . The SYBYL status bits indicate it is the ROOT substructure.
  • 23. References • http://www.tripos.com/data/support/mol2.pdf • http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php • Description of Several Chemical Structure File Formats Used by Computer Programs Developed at Molecular Design Limited. Arthur Dalby etal. J. Chem. Inf Comput. Sci. 1992, 32, 244-255. • http://www.chem.ucla.edu/harding/tutorials/stereochem/rsez.pdf • http://www.chem.ucla.edu/harding/notes/notes_14C_stereo03.pdf