SlideShare una empresa de Scribd logo
1 de 31
1
 Data is stored in a biological database in
the form of sequences or molecular form
 Unique file format
 Representation of data in biological
database
 Categories of file formats
 Sequence database
 Molecular database
2
 Gene bank flat-file Format
 FASTA Format
 Multi-FASTA Format
 GCG Format
 GCG-MSF Format
 EMBL Format
 Clustal Format
 SWIS PROT format
3
 Used by NCBI
 It is divided into three parts
 Header just a direct and very precise or
brief introductory part
 Features
all genes in seq., location of genes in genome,
protein product and coding genes etc.
 Sequence : ORIGIN atcgatcgatgcgctat //
4
 HEADRES
 Locus
 Definition
 Accession
 Version
 Dbsource: dates for creation and modifications
 Keywords
 Source
 Organism
 References
 Authors
 Title
 Journal
 Medline ID: all published sources
 Comment
 FEATURES
 SEQUENCE
5
6
7
8
 One line header
 Stats with > followed by name of gene
 Sequence of gene or protein
 Blank spaces
 Paragraph marks
 Numerals
 Are all ignored
 Steric sign * at the end
9
 >p53
ctcgaggggc ctagacattg ccctccagag agagcaccca
acaccctcca ggcttgaccg
61 gccagggtgt ccccttccta ccttggagag agcagcccca
gggcatcctg cagggggtgc
121 tgggacacca gctggccttc aaggtctctg cctccctcca
gccaccccac tacacgctgc
181 tgggatcctg gatctcagct ccctggccga caacactggc
aaactcctac tcatccacga
241 aggccctcct gggcatggtg gtccttccca gcctggcagt
ctgttcctca cacaccttgt
301 tagtgcccag cccctgaggt tgcagctggg ggtgtctctg
aagggctgtg agcccccagg
361 aagccctggg gaagtgcctg ccttgcctcc ccccggccct10
11
 Just like an aggregation of FASTA file as listed
above
 Multiple sequences follow one after the other
 Single file
 Accepted by several databases
 Clustal W
 Multalin
12
 > jhuma
gccagggtgt ccccttccta ccttggagag agcagcccca
gggcatcctg cagggggtgc
 >bhuma
gccagggtgt ccccttccta ccttggagag agcagcccca
gggcatcctg cagggggtgc
 >puma
gccagggtgt ccccttccta ccttggagag agcagcccca
gggcatcctg cagggggtgc
 >zuma
gccagggtgt ccccttccta ccttggagag agcagcccca
gggcatcctg cagggggtgc 13
14
 GCG: genetics computer group
 First line says it all ….
 !!N.A_SEQUENCE 1.0
 !!AA_SEQUENCE 1.0
 Just a simple format in which we just get
to now the sequence for the genes or
proteins
15
16
 Multiple sequences
 Sequence name
 Sequences
 Alignment
 Word pileup indicates that It is a multiple
sequence containing file
 Mandatory MSF word indicated in the file that
tells that it is an MSF GCG file and is not just
GCG
 Comments terminated with //
 2 consecutive blank lines
 Multiple sequences 17
18
 Sequence format of European molecular
biology laboratory database
 Starts with ID identification number
 Ends with // as terminator
 Different lines with own format
 Used to record various forms of data
 i.e DNA, RNA, GENE, PROTEIN etc etc
19
20
 Most widely used sequence alignment tool
 CLUSTAL W
 CLUSTAL X
 Aligned protein or gene sequences
21
22
 Protein sequence database
 ID : identification number
 AC: accession number
 DE: description
 GN: gene name
 OS: organism specie
 OG: organelle
 OC: organism classification
 OX: organism taxonomy cross reference
 RN: reference number
 RP: reference position 23
 RC: reference comment
 RX: reference cross reference
 RA: reference author
 RT: reference title
 RL: reference location
 CC: blank
 DR: database cross reference
 KW: key word
 FT: feature table
 SQ: sequence 24
25
 Several software's have been designed by … ?
 The aim of these software's is to make a
detailed conversion of one sequence format
into another
 Some of the software used widely for sequence
inter-conversion are :
 ReadSeq
 GCG
 SeqVerter
 Seqret 26
 Developed by Dr. D.G Gilbert
 Automated conversion
 18 supported file formats are there which
can be interconverted into one another
27
28
29
 FASTA
 Multi FASTA
 Flat file
 GCG format
 EMBL
 Clustal
 SWISS PROT
Make each file by this Friday and send as
attachments in an email 30
31

Más contenido relacionado

La actualidad más candente (20)

Scop database
Scop databaseScop database
Scop database
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
Protein database
Protein databaseProtein database
Protein database
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Biological database
Biological databaseBiological database
Biological database
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Prosite
PrositeProsite
Prosite
 
Clustal
ClustalClustal
Clustal
 
Structural databases
Structural databases Structural databases
Structural databases
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 

Destacado

Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303Bruno Mmassy
 
molecular file formats in bioinformatics
molecular file formats in bioinformaticsmolecular file formats in bioinformatics
molecular file formats in bioinformaticsnadeem akhter
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataAbhik Seal
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources innocent87
 

Destacado (14)

SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303
 
Design your own test automation tool
Design your own test automation toolDesign your own test automation tool
Design your own test automation tool
 
molecular file formats in bioinformatics
molecular file formats in bioinformaticsmolecular file formats in bioinformatics
molecular file formats in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical data
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Similar a sequence of file formats in bioinformatics

Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesJackie Wirz, PhD
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databasesMeetika Gupta
 
Introduction to Bioinformatics: Part 2
Introduction to Bioinformatics: Part 2Introduction to Bioinformatics: Part 2
Introduction to Bioinformatics: Part 2AhmedAbdElMoniem35
 
Transcription and Translation
Transcription and TranslationTranscription and Translation
Transcription and TranslationAnkit Kumar
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012Koppolu Ravi
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
A comprehensive study of shuttle vector & binary vector and its rules of in ...
A comprehensive study of shuttle vector & binary vector and its rules of in  ...A comprehensive study of shuttle vector & binary vector and its rules of in  ...
A comprehensive study of shuttle vector & binary vector and its rules of in ...PRABAL SINGH
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Mark Pallen
 
LECTURE 7.pptx
LECTURE 7.pptxLECTURE 7.pptx
LECTURE 7.pptxericndunek
 

Similar a sequence of file formats in bioinformatics (20)

Gen bank
Gen bankGen bank
Gen bank
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 
Introduction to Bioinformatics: Part 2
Introduction to Bioinformatics: Part 2Introduction to Bioinformatics: Part 2
Introduction to Bioinformatics: Part 2
 
Transcription and Translation
Transcription and TranslationTranscription and Translation
Transcription and Translation
 
Databases
DatabasesDatabases
Databases
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
A comprehensive study of shuttle vector & binary vector and its rules of in ...
A comprehensive study of shuttle vector & binary vector and its rules of in  ...A comprehensive study of shuttle vector & binary vector and its rules of in  ...
A comprehensive study of shuttle vector & binary vector and its rules of in ...
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
LECTURE 7.pptx
LECTURE 7.pptxLECTURE 7.pptx
LECTURE 7.pptx
 

Más de nadeem akhter

Human development and sex determination
Human development and sex determination Human development and sex determination
Human development and sex determination nadeem akhter
 
DNA structure and chromosome organization
DNA structure and chromosome organization DNA structure and chromosome organization
DNA structure and chromosome organization nadeem akhter
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database nadeem akhter
 
ATOMIC ABSORPTION SPECTROSCOPY
ATOMIC ABSORPTION SPECTROSCOPYATOMIC ABSORPTION SPECTROSCOPY
ATOMIC ABSORPTION SPECTROSCOPYnadeem akhter
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Islam and environmental biology Msc Biology
Islam and environmental biology Msc BiologyIslam and environmental biology Msc Biology
Islam and environmental biology Msc Biologynadeem akhter
 
Chromatography and its types
Chromatography and its typesChromatography and its types
Chromatography and its typesnadeem akhter
 

Más de nadeem akhter (10)

UV-VIS Spectroscopy
UV-VIS SpectroscopyUV-VIS Spectroscopy
UV-VIS Spectroscopy
 
Islamandscience
IslamandscienceIslamandscience
Islamandscience
 
Human development and sex determination
Human development and sex determination Human development and sex determination
Human development and sex determination
 
DNA structure and chromosome organization
DNA structure and chromosome organization DNA structure and chromosome organization
DNA structure and chromosome organization
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
 
Molecular viewers
Molecular viewers Molecular viewers
Molecular viewers
 
ATOMIC ABSORPTION SPECTROSCOPY
ATOMIC ABSORPTION SPECTROSCOPYATOMIC ABSORPTION SPECTROSCOPY
ATOMIC ABSORPTION SPECTROSCOPY
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Islam and environmental biology Msc Biology
Islam and environmental biology Msc BiologyIslam and environmental biology Msc Biology
Islam and environmental biology Msc Biology
 
Chromatography and its types
Chromatography and its typesChromatography and its types
Chromatography and its types
 

Último

Fight Scene Storyboard (Action/Adventure Animation)
Fight Scene Storyboard (Action/Adventure Animation)Fight Scene Storyboard (Action/Adventure Animation)
Fight Scene Storyboard (Action/Adventure Animation)finlaygoodall2
 
Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...
Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...
Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...Amil baba
 
What Life Would Be Like From A Different Perspective (saltyvixenstories.com)
What Life Would Be Like From A Different Perspective (saltyvixenstories.com)What Life Would Be Like From A Different Perspective (saltyvixenstories.com)
What Life Would Be Like From A Different Perspective (saltyvixenstories.com)Salty Vixen Stories & More
 
办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书
办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书
办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024
ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024
ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024Durkin Entertainment LLC
 
A Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' Mother
A Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' MotherA Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' Mother
A Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' Motherget joys
 
Statement Of Intent - - Copy.documentfile
Statement Of Intent - - Copy.documentfileStatement Of Intent - - Copy.documentfile
Statement Of Intent - - Copy.documentfilef4ssvxpz62
 
THE MEDIC, A STORY for entertainment.docx
THE MEDIC, A STORY for entertainment.docxTHE MEDIC, A STORY for entertainment.docx
THE MEDIC, A STORY for entertainment.docxazuremorn
 
Aesthetic Design Inspiration by Slidesgo.pptx
Aesthetic Design Inspiration by Slidesgo.pptxAesthetic Design Inspiration by Slidesgo.pptx
Aesthetic Design Inspiration by Slidesgo.pptxsayemalkadripial4
 
NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...
NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...
NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...Amil Baba Dawood bangali
 
Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...
Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...
Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...TeslaStakeHolder
 
Princess Jahan's Tuition Classes, a story for entertainment
Princess Jahan's Tuition Classes, a story for entertainmentPrincess Jahan's Tuition Classes, a story for entertainment
Princess Jahan's Tuition Classes, a story for entertainmentazuremorn
 
Taken Pilot Episode Story pitch Document
Taken Pilot Episode Story pitch DocumentTaken Pilot Episode Story pitch Document
Taken Pilot Episode Story pitch Documentf4ssvxpz62
 
Zoom In Game for ice breaking in a training
Zoom In Game for ice breaking in a trainingZoom In Game for ice breaking in a training
Zoom In Game for ice breaking in a trainingRafik ABDI
 
Biswanath Byam Samiti Open Quiz 2022 by Qui9 Grand Finale
Biswanath Byam Samiti Open Quiz 2022 by Qui9 Grand FinaleBiswanath Byam Samiti Open Quiz 2022 by Qui9 Grand Finale
Biswanath Byam Samiti Open Quiz 2022 by Qui9 Grand FinaleQui9 (Ultimate Quizzing)
 
NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...
NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...
NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...Amil Baba Dawood bangali
 

Último (20)

Fight Scene Storyboard (Action/Adventure Animation)
Fight Scene Storyboard (Action/Adventure Animation)Fight Scene Storyboard (Action/Adventure Animation)
Fight Scene Storyboard (Action/Adventure Animation)
 
Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...
Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...
Uk-NO1 Amil In Karachi Best Amil In Karachi Bangali Baba In Karachi Aamil In ...
 
What Life Would Be Like From A Different Perspective (saltyvixenstories.com)
What Life Would Be Like From A Different Perspective (saltyvixenstories.com)What Life Would Be Like From A Different Perspective (saltyvixenstories.com)
What Life Would Be Like From A Different Perspective (saltyvixenstories.com)
 
办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书
办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书
办理滑铁卢大学毕业证成绩单|购买加拿大文凭证书
 
ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024
ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024
ECOLUXE pre-ESPYS Ultimate Sports Lounge 2024
 
S10_E06-Sincerely,The Friday Club- Prelims Farewell Quiz.pptx
S10_E06-Sincerely,The Friday Club- Prelims Farewell Quiz.pptxS10_E06-Sincerely,The Friday Club- Prelims Farewell Quiz.pptx
S10_E06-Sincerely,The Friday Club- Prelims Farewell Quiz.pptx
 
A Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' Mother
A Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' MotherA Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' Mother
A Spotlight on Darla Leigh Pittman Rodgers: Aaron Rodgers' Mother
 
Statement Of Intent - - Copy.documentfile
Statement Of Intent - - Copy.documentfileStatement Of Intent - - Copy.documentfile
Statement Of Intent - - Copy.documentfile
 
Sincerely, The Friday Club - Farewell Quiz-Finals.pptx
Sincerely, The Friday Club - Farewell Quiz-Finals.pptxSincerely, The Friday Club - Farewell Quiz-Finals.pptx
Sincerely, The Friday Club - Farewell Quiz-Finals.pptx
 
THE MEDIC, A STORY for entertainment.docx
THE MEDIC, A STORY for entertainment.docxTHE MEDIC, A STORY for entertainment.docx
THE MEDIC, A STORY for entertainment.docx
 
Aesthetic Design Inspiration by Slidesgo.pptx
Aesthetic Design Inspiration by Slidesgo.pptxAesthetic Design Inspiration by Slidesgo.pptx
Aesthetic Design Inspiration by Slidesgo.pptx
 
NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...
NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...
NO1 Certified Black magic specialist,Expert in Pakistan Amil Baba kala ilam E...
 
Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...
Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...
Flying Avocado Cat Cryptocurrency Created, Coded, Generated and Named by Grok...
 
Princess Jahan's Tuition Classes, a story for entertainment
Princess Jahan's Tuition Classes, a story for entertainmentPrincess Jahan's Tuition Classes, a story for entertainment
Princess Jahan's Tuition Classes, a story for entertainment
 
Taken Pilot Episode Story pitch Document
Taken Pilot Episode Story pitch DocumentTaken Pilot Episode Story pitch Document
Taken Pilot Episode Story pitch Document
 
Moveable Feast_Travel-Lifestyle-Culture Quiz.pptx
Moveable Feast_Travel-Lifestyle-Culture Quiz.pptxMoveable Feast_Travel-Lifestyle-Culture Quiz.pptx
Moveable Feast_Travel-Lifestyle-Culture Quiz.pptx
 
Zoom In Game for ice breaking in a training
Zoom In Game for ice breaking in a trainingZoom In Game for ice breaking in a training
Zoom In Game for ice breaking in a training
 
Biswanath Byam Samiti Open Quiz 2022 by Qui9 Grand Finale
Biswanath Byam Samiti Open Quiz 2022 by Qui9 Grand FinaleBiswanath Byam Samiti Open Quiz 2022 by Qui9 Grand Finale
Biswanath Byam Samiti Open Quiz 2022 by Qui9 Grand Finale
 
S10_E02_How to Pimp Social Media 101.pptx
S10_E02_How to Pimp Social Media 101.pptxS10_E02_How to Pimp Social Media 101.pptx
S10_E02_How to Pimp Social Media 101.pptx
 
NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...
NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...
NO1 Certified kala ilam Expert In Peshwar Kala Jadu Specialist In Peshwar Kal...
 

sequence of file formats in bioinformatics

  • 1. 1
  • 2.  Data is stored in a biological database in the form of sequences or molecular form  Unique file format  Representation of data in biological database  Categories of file formats  Sequence database  Molecular database 2
  • 3.  Gene bank flat-file Format  FASTA Format  Multi-FASTA Format  GCG Format  GCG-MSF Format  EMBL Format  Clustal Format  SWIS PROT format 3
  • 4.  Used by NCBI  It is divided into three parts  Header just a direct and very precise or brief introductory part  Features all genes in seq., location of genes in genome, protein product and coding genes etc.  Sequence : ORIGIN atcgatcgatgcgctat // 4
  • 5.  HEADRES  Locus  Definition  Accession  Version  Dbsource: dates for creation and modifications  Keywords  Source  Organism  References  Authors  Title  Journal  Medline ID: all published sources  Comment  FEATURES  SEQUENCE 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9.  One line header  Stats with > followed by name of gene  Sequence of gene or protein  Blank spaces  Paragraph marks  Numerals  Are all ignored  Steric sign * at the end 9
  • 10.  >p53 ctcgaggggc ctagacattg ccctccagag agagcaccca acaccctcca ggcttgaccg 61 gccagggtgt ccccttccta ccttggagag agcagcccca gggcatcctg cagggggtgc 121 tgggacacca gctggccttc aaggtctctg cctccctcca gccaccccac tacacgctgc 181 tgggatcctg gatctcagct ccctggccga caacactggc aaactcctac tcatccacga 241 aggccctcct gggcatggtg gtccttccca gcctggcagt ctgttcctca cacaccttgt 301 tagtgcccag cccctgaggt tgcagctggg ggtgtctctg aagggctgtg agcccccagg 361 aagccctggg gaagtgcctg ccttgcctcc ccccggccct10
  • 11. 11
  • 12.  Just like an aggregation of FASTA file as listed above  Multiple sequences follow one after the other  Single file  Accepted by several databases  Clustal W  Multalin 12
  • 13.  > jhuma gccagggtgt ccccttccta ccttggagag agcagcccca gggcatcctg cagggggtgc  >bhuma gccagggtgt ccccttccta ccttggagag agcagcccca gggcatcctg cagggggtgc  >puma gccagggtgt ccccttccta ccttggagag agcagcccca gggcatcctg cagggggtgc  >zuma gccagggtgt ccccttccta ccttggagag agcagcccca gggcatcctg cagggggtgc 13
  • 14. 14
  • 15.  GCG: genetics computer group  First line says it all ….  !!N.A_SEQUENCE 1.0  !!AA_SEQUENCE 1.0  Just a simple format in which we just get to now the sequence for the genes or proteins 15
  • 16. 16
  • 17.  Multiple sequences  Sequence name  Sequences  Alignment  Word pileup indicates that It is a multiple sequence containing file  Mandatory MSF word indicated in the file that tells that it is an MSF GCG file and is not just GCG  Comments terminated with //  2 consecutive blank lines  Multiple sequences 17
  • 18. 18
  • 19.  Sequence format of European molecular biology laboratory database  Starts with ID identification number  Ends with // as terminator  Different lines with own format  Used to record various forms of data  i.e DNA, RNA, GENE, PROTEIN etc etc 19
  • 20. 20
  • 21.  Most widely used sequence alignment tool  CLUSTAL W  CLUSTAL X  Aligned protein or gene sequences 21
  • 22. 22
  • 23.  Protein sequence database  ID : identification number  AC: accession number  DE: description  GN: gene name  OS: organism specie  OG: organelle  OC: organism classification  OX: organism taxonomy cross reference  RN: reference number  RP: reference position 23
  • 24.  RC: reference comment  RX: reference cross reference  RA: reference author  RT: reference title  RL: reference location  CC: blank  DR: database cross reference  KW: key word  FT: feature table  SQ: sequence 24
  • 25. 25
  • 26.  Several software's have been designed by … ?  The aim of these software's is to make a detailed conversion of one sequence format into another  Some of the software used widely for sequence inter-conversion are :  ReadSeq  GCG  SeqVerter  Seqret 26
  • 27.  Developed by Dr. D.G Gilbert  Automated conversion  18 supported file formats are there which can be interconverted into one another 27
  • 28. 28
  • 29. 29
  • 30.  FASTA  Multi FASTA  Flat file  GCG format  EMBL  Clustal  SWISS PROT Make each file by this Friday and send as attachments in an email 30
  • 31. 31