SlideShare una empresa de Scribd logo
1 de 23
Molecular File Formats
Types of File formats
Elsevier MDL supports a number of file formats for representation and
communication of chemical information.
Name Description
molfiles Each molfile describes a single molecular structure which can
contain disjoint fragments as salts .
SDfiles They are Structure-data files which contain data for any
number of molecules .SDfiles are the primary format for
large-scale data transfer between MDL databases.
RGfiles An RGfile describes a single molecular query with Rgroups.
Each RGfile is a combination of Ctabs defining the root
molecule and each member of each Rgroup in the query.
rxnfiles Reaction files.Eachrxnfile contains the structural information
for the reactants and products of a single reaction.
RDfiles Reaction Data File: RDfile is a more general format that can
include reactions as well as molecules.
File Formats
http://c4.cabrillo.edu/404/ctfile.pdf
Connection Table [Ctab]
A connection table (Ctab) contains information describing the structural
relationships and properties of a collection of atoms. The connection table is
fundamental to all of the MDL file formats.
9 9 0 0 0 0 0 0 0 0999 V2000 Countline
-1.0200 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5100 2.4100 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 2.3900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0000 3.2700 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0300 3.2700 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Atom Block
-0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0100 3.2800 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.0300 3.2800 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 8 1 0
2 3 2 3
3 4 1 0
4 5 2 0
4 6 1 0
6 7 2 3 Bonds Block
7 8 1 0
8 9 2 0
Ctab Features
Parts of Ctab Description
Counts Line Important specifications here relate to the number of
atoms, bonds, and atom lists, the chiral flag setting,
and the Ctab version.
Atom Block Specifies the atomic symbol and any mass difference,
charge, stereochemistry, and associated hydrogens for
each atom.
Bond Block Specifies the two atoms connected by the bond, the
bond type, and any bond stereochemistry and topology
(chain or ring properties) for each bond.
Properties Block Provides for future expandability of Ctab features,
while maintaining compatibility with earlier Ctab
configurations.
1. Counts Line
aaabbblllfffcccsssmmmvvvvvv
where
• aaa = number of atoms (current max 255)* [Generic]
• bbb = number of bonds (current max 255)* [Generic]
• lll = number of atom lists (max 30)* [Query]
• fff = (obsolete)
• ccc = chiral flag: 0=not chiral, 1=chiral [Generic]
• sss = number of stext entries [MDL ISIS/Desktop]
• Mmm = number of lines of additional properties, including the M END line.
no longer supported, the default is set to 999.[Generic]
shows six atoms, five bonds, the CHIRAL flag on, and three lines in the
properties block:
6 5 0 0 1 0 3 V2000
Shows 9 atoms, 9 bonds, the CHIRAL flag of
9 9 0 0 0 0 0 0 0 0999 V2000
2. Atom Block
The Atom Block is made up of atom lines, one line per atom with the
following format.
xxxxx.xxxxyyyyy.yyyyzzzzz.zzzzaaaddcccssshhhbbbvvvHHHrrriiimmmnnneee
Field Meaning Values
XYZ Atom coordinates
aaa atom symbol entry in periodic table or L for atom list, A, Q, * for unspecified
atom, and LP for lone pair, or R# for Rgroup label
dd Mass difference -3, -2, -1, 0, 1, 2, 3, 4 (0 if value beyond these limits)
ccc Charge 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1,
4 = doublet radical, 5 = -1, 6 = -2, 7 = -3
sss atom stereo parity 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo
center.
hhh hydrogen count + 1 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4
bbb stereo care box 0 = ignore stereo configuration of this double bond atom, 1 =
stereo configuration of double bond atom must match
vvv Valence 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero
valence.
HHH H0 designator 0 = not specified, 1 = no H atoms allowed
3.Bonds block
The Bond Block is made up of bond lines, one line per bond, with the following format:
111222tttsssxxxrrrccc
Field Meaning Values
111 First atom number 1 - number of atoms
222 Second atom number 1 - number of atoms
ttt Bond type 1 = Single, 2 = Double, 3 = Triple, 4 =
Aromatic, 5 = Single or Double, 6 = Single
or Aromatic, 7 = Double or Aromatic, 8 =
Any
sss bond stereo Single bonds: 0 = not stereo, 1 = Up, 4 =
Either, 6 = Down, Double bonds: 0 = Use
x-, y-, z-coords from atom block to
determine cis or trans, 3 = Cis or trans
(either) double bond.
rrr Bond topology 0 = Either, 1 = Ring, 2 = Chain
Mol File
A molfile consists of a header block and a connection table. The
following shows a molfile for alanine corresponding to the following
structure:x`
Identifies the molfile: molecule name,
user's name, program, date, and other
miscellaneous information and
comments
atom 4: charge +1
atom 6: charge -1
1 entry for an isotope
atom 3: mass=13
Representation of Stereochemistry
What is Stereochemistry ?
http://www.chemhelper.com/enantiomers.html
Representationof Stereochemistry: Atom Block
Representationof Stereochemistry: Bond Block
1= Shows stereo bond up
RGfiles
In RGfilesLines beginning with $ define the overall structure of the Rgroup query; the
molfile header block is embedded in the Rgroup header block.In addition to the
primary connection table (Ctab block) for the root structure, a Ctab block defines each
member (*m) within each Rgroup (*r).
Example of RGfile
SDfile
An SDfile (structure-data file) contains the structural information and associated data items for
one or more compounds.
*l is repeated for each line of data
*d is repeated for each data item
*c is repeated for each compound
Example of SDfile
RXNfile
Rxnfiles contain structural data for the reactants and
products of a reaction.
where:
*r is repeated for each reactant
*p is repeated for each product
RXNfile example
RDfiles
• An RD-File(reaction data file) consist of a set of edible “records”. Each record
defines a molecule or reaction, and its associated data.
• The [RDfile Header] must occur at the beginning of the physical file and
indentifies the file as an RDfile. A version stamp of 1 is given for future expansion
of the format.
• $DATM: Date/time (M/D/Y, c) stamp. This line is treated as a comment and
ignored when the program is read.
*d is repeated for each data item
*r is repeated for each reaction or molecule
RDfile example
Mol2 files from TRIPOS
Original from Tripos. Contains atom coordinates, bonds, substructure information.This
format supports partial charges and isotopes.
• Lines 1,2,3,5 and 6 are comments. They contain
the molecule name and information about the time
the molecule was created and last modified.
• Lines 8, 15, 28, and 41 in the example are Record
Type Indicator(RTIs). It is used to indicate the type
of data which follows in a .mol2 file.
• Lines 9-12, 16-27, 29-40, and 42 are all data
records
Parts of mol2 file
@<TRIPOS>MOLECULE
The first data line is the name of the molecule. The second data line contains the number of atoms, bonds,
substructures, features, and sets associated with the molecule. The third data line is the molecule type. The fourth data
line tells the type of charges associated with the molecule. The fifth data line contains the internal SYBYL status bits
associated with the molecule. The last data line contains any comment which may be associated with the molecule.
@<TRIPOS>ATOM
atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]]
Example :
1 CA -0.149 0.299 0.000 C.3 1 ALA1 0.000 BACKBONE|DICT|DIRECT
In the example above the atom has ID number 1. It is named CA and is located at (-0.149, 0.299, 0.000). Its atom type is C.3. It
belongs to the substructure with ID 1 which is named ALA1. The charge associated with the atom is 0.000 and the SYBYL status
bits associated with the atom are
BACKBONE, DICT, and DIRECT.
@<TRIPOS>BOND
bond_id origin_atom_id target_atom_id bond_type [status_bits]
Example : 1 1 2 ar
Example bond shows, it has ID number 1 and connects atoms 1 and 2 .It is an aromatic bond.
@<TRIPOS>SUBSTRUCTURE
subst_id subst_name root_atom [subst_type [dict_type [chain [sub_type [inter_bonds [status [comment]]]]]]]
Example: 1 BENZENE1 PERM 0 **** ****** 0 ROOT
The substructure has 1 as ID BENZENE1 as name .It is a type of PERM and associated with dictionary type 0 . The SYBYL status
bits indicate it is the ROOT substructure.
References
• http://www.tripos.com/data/support/mol2.pdf
• http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php
• Description of Several Chemical Structure File Formats Used by Computer Programs
Developed at Molecular Design Limited. Arthur Dalby etal. J. Chem. Inf Comput. Sci.
1992, 32, 244-255.
• http://www.chem.ucla.edu/harding/tutorials/stereochem/rsez.pdf
• http://www.chem.ucla.edu/harding/notes/notes_14C_stereo03.pdf

Más contenido relacionado

La actualidad más candente

Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionRoshan Karunarathna
 
2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORSSmita Jain
 
Qsar and drug design ppt
Qsar and drug design pptQsar and drug design ppt
Qsar and drug design pptAbhik Seal
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug designADAM S
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure DeterminationAmjad Ibrahim
 
A Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsA Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsSunghwan Kim
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Aakshay Subramaniam
 
Ligand based drug design
Ligand based drug designLigand based drug design
Ligand based drug designSatyendra Yadav
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionArindam Ghosh
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand dockingbaoilleach
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessProf. Dr. Basavaraj Nanjwade
 

La actualidad más candente (20)

Molecular modelling
Molecular modelling Molecular modelling
Molecular modelling
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Example of force fields
Example of force fieldsExample of force fields
Example of force fields
 
Pymol
PymolPymol
Pymol
 
2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Qsar and drug design ppt
Qsar and drug design pptQsar and drug design ppt
Qsar and drug design ppt
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug design
 
Pharmacophore
PharmacophorePharmacophore
Pharmacophore
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
MD Simulation
MD SimulationMD Simulation
MD Simulation
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 
A Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsA Brief Overview of Cheminformatics
A Brief Overview of Cheminformatics
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)
 
Ligand based drug design
Ligand based drug designLigand based drug design
Ligand based drug design
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 

Destacado

BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
molecular file formats in bioinformatics
molecular file formats in bioinformaticsmolecular file formats in bioinformatics
molecular file formats in bioinformaticsnadeem akhter
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303Bruno Mmassy
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformaticsnadeem akhter
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 

Destacado (13)

Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Design your own test automation tool
Design your own test automation toolDesign your own test automation tool
Design your own test automation tool
 
molecular file formats in bioinformatics
molecular file formats in bioinformaticsmolecular file formats in bioinformatics
molecular file formats in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Similar a Chemical File Formats for storing chemical data

Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
Md simulations modified
Md simulations modifiedMd simulations modified
Md simulations modifiedshahmeermateen
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSAndrea Benassi
 
2.Electronic Structure
2.Electronic  Structure2.Electronic  Structure
2.Electronic StructureAlan Crooks
 
Cmc chapter 08
Cmc chapter 08Cmc chapter 08
Cmc chapter 08Jane Hamze
 
class8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdfclass8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdfSureshGoli2
 
Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Robin Gutell
 
Report on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task ForceReport on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task ForceEduserv Foundation
 
Oracle sql tutorial
Oracle sql tutorialOracle sql tutorial
Oracle sql tutorialMohd Tousif
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
LIBRARY_information.pdf
LIBRARY_information.pdfLIBRARY_information.pdf
LIBRARY_information.pdfagnathavasi
 
Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Raj Naik
 

Similar a Chemical File Formats for storing chemical data (20)

Oct 2011 ualr
Oct 2011 ualrOct 2011 ualr
Oct 2011 ualr
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
ch3
ch3ch3
ch3
 
Md simulations modified
Md simulations modifiedMd simulations modified
Md simulations modified
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPS
 
2.Electronic Structure
2.Electronic  Structure2.Electronic  Structure
2.Electronic Structure
 
Basic execution
Basic executionBasic execution
Basic execution
 
Cmc chapter 08
Cmc chapter 08Cmc chapter 08
Cmc chapter 08
 
class8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdfclass8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdf
 
RDKit Gems
RDKit GemsRDKit Gems
RDKit Gems
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724
 
Report on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task ForceReport on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task Force
 
SQL
SQLSQL
SQL
 
Oracle sql tutorial
Oracle sql tutorialOracle sql tutorial
Oracle sql tutorial
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
LIBRARY_information.pdf
LIBRARY_information.pdfLIBRARY_information.pdf
LIBRARY_information.pdf
 
DBMS Unit-2.pdf
DBMS Unit-2.pdfDBMS Unit-2.pdf
DBMS Unit-2.pdf
 
Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive
 
Soap win
Soap winSoap win
Soap win
 

Más de Abhik Seal

Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in rAbhik Seal
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryAbhik Seal
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Data handling in r
Data handling in rData handling in r
Data handling in rAbhik Seal
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical DatasetsAbhik Seal
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsAbhik Seal
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles Abhik Seal
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with googleAbhik Seal
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using dataAbhik Seal
 
R scatter plots
R scatter plotsR scatter plots
R scatter plotsAbhik Seal
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorialAbhik Seal
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
PharmacohorepptAbhik Seal
 

Más de Abhik Seal (20)

Chemical data
Chemical dataChemical data
Chemical data
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Networks
NetworksNetworks
Networks
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
 
Poster
PosterPoster
Poster
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
 
Weka guide
Weka guideWeka guide
Weka guide
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 
Document1
Document1Document1
Document1
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Chemical File Formats for storing chemical data

  • 2. Types of File formats Elsevier MDL supports a number of file formats for representation and communication of chemical information. Name Description molfiles Each molfile describes a single molecular structure which can contain disjoint fragments as salts . SDfiles They are Structure-data files which contain data for any number of molecules .SDfiles are the primary format for large-scale data transfer between MDL databases. RGfiles An RGfile describes a single molecular query with Rgroups. Each RGfile is a combination of Ctabs defining the root molecule and each member of each Rgroup in the query. rxnfiles Reaction files.Eachrxnfile contains the structural information for the reactants and products of a single reaction. RDfiles Reaction Data File: RDfile is a more general format that can include reactions as well as molecules.
  • 4. Connection Table [Ctab] A connection table (Ctab) contains information describing the structural relationships and properties of a collection of atoms. The connection table is fundamental to all of the MDL file formats. 9 9 0 0 0 0 0 0 0 0999 V2000 Countline -1.0200 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.5100 2.4100 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5000 2.3900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.0000 3.2700 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 3.2700 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Atom Block -0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0100 3.2800 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.0300 3.2800 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 8 1 0 2 3 2 3 3 4 1 0 4 5 2 0 4 6 1 0 6 7 2 3 Bonds Block 7 8 1 0 8 9 2 0
  • 5. Ctab Features Parts of Ctab Description Counts Line Important specifications here relate to the number of atoms, bonds, and atom lists, the chiral flag setting, and the Ctab version. Atom Block Specifies the atomic symbol and any mass difference, charge, stereochemistry, and associated hydrogens for each atom. Bond Block Specifies the two atoms connected by the bond, the bond type, and any bond stereochemistry and topology (chain or ring properties) for each bond. Properties Block Provides for future expandability of Ctab features, while maintaining compatibility with earlier Ctab configurations.
  • 6. 1. Counts Line aaabbblllfffcccsssmmmvvvvvv where • aaa = number of atoms (current max 255)* [Generic] • bbb = number of bonds (current max 255)* [Generic] • lll = number of atom lists (max 30)* [Query] • fff = (obsolete) • ccc = chiral flag: 0=not chiral, 1=chiral [Generic] • sss = number of stext entries [MDL ISIS/Desktop] • Mmm = number of lines of additional properties, including the M END line. no longer supported, the default is set to 999.[Generic] shows six atoms, five bonds, the CHIRAL flag on, and three lines in the properties block: 6 5 0 0 1 0 3 V2000 Shows 9 atoms, 9 bonds, the CHIRAL flag of 9 9 0 0 0 0 0 0 0 0999 V2000
  • 7. 2. Atom Block The Atom Block is made up of atom lines, one line per atom with the following format. xxxxx.xxxxyyyyy.yyyyzzzzz.zzzzaaaddcccssshhhbbbvvvHHHrrriiimmmnnneee Field Meaning Values XYZ Atom coordinates aaa atom symbol entry in periodic table or L for atom list, A, Q, * for unspecified atom, and LP for lone pair, or R# for Rgroup label dd Mass difference -3, -2, -1, 0, 1, 2, 3, 4 (0 if value beyond these limits) ccc Charge 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1, 4 = doublet radical, 5 = -1, 6 = -2, 7 = -3 sss atom stereo parity 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo center. hhh hydrogen count + 1 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4 bbb stereo care box 0 = ignore stereo configuration of this double bond atom, 1 = stereo configuration of double bond atom must match vvv Valence 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero valence. HHH H0 designator 0 = not specified, 1 = no H atoms allowed
  • 8. 3.Bonds block The Bond Block is made up of bond lines, one line per bond, with the following format: 111222tttsssxxxrrrccc Field Meaning Values 111 First atom number 1 - number of atoms 222 Second atom number 1 - number of atoms ttt Bond type 1 = Single, 2 = Double, 3 = Triple, 4 = Aromatic, 5 = Single or Double, 6 = Single or Aromatic, 7 = Double or Aromatic, 8 = Any sss bond stereo Single bonds: 0 = not stereo, 1 = Up, 4 = Either, 6 = Down, Double bonds: 0 = Use x-, y-, z-coords from atom block to determine cis or trans, 3 = Cis or trans (either) double bond. rrr Bond topology 0 = Either, 1 = Ring, 2 = Chain
  • 9. Mol File A molfile consists of a header block and a connection table. The following shows a molfile for alanine corresponding to the following structure:x` Identifies the molfile: molecule name, user's name, program, date, and other miscellaneous information and comments atom 4: charge +1 atom 6: charge -1 1 entry for an isotope atom 3: mass=13
  • 10. Representation of Stereochemistry What is Stereochemistry ? http://www.chemhelper.com/enantiomers.html
  • 12. Representationof Stereochemistry: Bond Block 1= Shows stereo bond up
  • 13. RGfiles In RGfilesLines beginning with $ define the overall structure of the Rgroup query; the molfile header block is embedded in the Rgroup header block.In addition to the primary connection table (Ctab block) for the root structure, a Ctab block defines each member (*m) within each Rgroup (*r).
  • 15. SDfile An SDfile (structure-data file) contains the structural information and associated data items for one or more compounds. *l is repeated for each line of data *d is repeated for each data item *c is repeated for each compound
  • 17. RXNfile Rxnfiles contain structural data for the reactants and products of a reaction. where: *r is repeated for each reactant *p is repeated for each product
  • 19. RDfiles • An RD-File(reaction data file) consist of a set of edible “records”. Each record defines a molecule or reaction, and its associated data. • The [RDfile Header] must occur at the beginning of the physical file and indentifies the file as an RDfile. A version stamp of 1 is given for future expansion of the format. • $DATM: Date/time (M/D/Y, c) stamp. This line is treated as a comment and ignored when the program is read. *d is repeated for each data item *r is repeated for each reaction or molecule
  • 21. Mol2 files from TRIPOS Original from Tripos. Contains atom coordinates, bonds, substructure information.This format supports partial charges and isotopes. • Lines 1,2,3,5 and 6 are comments. They contain the molecule name and information about the time the molecule was created and last modified. • Lines 8, 15, 28, and 41 in the example are Record Type Indicator(RTIs). It is used to indicate the type of data which follows in a .mol2 file. • Lines 9-12, 16-27, 29-40, and 42 are all data records
  • 22. Parts of mol2 file @<TRIPOS>MOLECULE The first data line is the name of the molecule. The second data line contains the number of atoms, bonds, substructures, features, and sets associated with the molecule. The third data line is the molecule type. The fourth data line tells the type of charges associated with the molecule. The fifth data line contains the internal SYBYL status bits associated with the molecule. The last data line contains any comment which may be associated with the molecule. @<TRIPOS>ATOM atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]] Example : 1 CA -0.149 0.299 0.000 C.3 1 ALA1 0.000 BACKBONE|DICT|DIRECT In the example above the atom has ID number 1. It is named CA and is located at (-0.149, 0.299, 0.000). Its atom type is C.3. It belongs to the substructure with ID 1 which is named ALA1. The charge associated with the atom is 0.000 and the SYBYL status bits associated with the atom are BACKBONE, DICT, and DIRECT. @<TRIPOS>BOND bond_id origin_atom_id target_atom_id bond_type [status_bits] Example : 1 1 2 ar Example bond shows, it has ID number 1 and connects atoms 1 and 2 .It is an aromatic bond. @<TRIPOS>SUBSTRUCTURE subst_id subst_name root_atom [subst_type [dict_type [chain [sub_type [inter_bonds [status [comment]]]]]]] Example: 1 BENZENE1 PERM 0 **** ****** 0 ROOT The substructure has 1 as ID BENZENE1 as name .It is a type of PERM and associated with dictionary type 0 . The SYBYL status bits indicate it is the ROOT substructure.
  • 23. References • http://www.tripos.com/data/support/mol2.pdf • http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php • Description of Several Chemical Structure File Formats Used by Computer Programs Developed at Molecular Design Limited. Arthur Dalby etal. J. Chem. Inf Comput. Sci. 1992, 32, 244-255. • http://www.chem.ucla.edu/harding/tutorials/stereochem/rsez.pdf • http://www.chem.ucla.edu/harding/notes/notes_14C_stereo03.pdf