SlideShare una empresa de Scribd logo
1 de 19
Presented by – SWARUP MALAKAR
A database is a repository of sequence ( DNA or amino acids ) stored in a
computer which provide a centralized and homogenous view of its content.
or, it is a vast collection of data pertaining to a specific topic, e.g.,
nucleotide sequence, protein sequence etc.
Basically, it is an electronic environment.
Databases are at the heart of bioinformatics.
1. Sequence databases: - that involves the sequences of both proteins and nucleic
acids.
2. Structural databases:- that involves only protein databases.
In additionally, it is also classified into three categories:
A. Primary database B. Secondary databases C. Composite databases.
It contain information of the sequence or structure alone either protein or
nucleic acid .
Example: PIR, SWISS-PROT for protein sequences , NCBI, EMBL and DDBJ for
genome sequences.
PIR: It is functionally annotated
protein sequences and structure.
PIR has collaborated with EBI and
SIB to establish the UniProt (
United Protein Databases).
The central resource of
protein sequence and function.
TREMBL
NCBI ( National Centre of Biotechnology Information ):
- Nov 4, 1988 , the NCBI was established as division of the National Library of medicine for the
development of information systems in molecular biology.
- The NCBI is located in Bethesta, Maryland (U.S.A).
- NCBI built the GenBank, which is an annotated collection of publically available nucleotide and
protein sequences.
- In 1988, the three partners (DDBJ, EMBL and GenBank) of the international Nucelotide
Sequences Database collaboration had a meeting and agreed to use a common format.
i. Maintains collaboration with several NIH institutes, academia, industry and other governmental
agencies.
ii. Develops, distributes, supports and coordinates access to a variety of databases and software for
the scientific and medical communities.
iii. Develops and promotes standards for databases, data deposition and exchange, and biological
nomenclature.
iv. Engages the members of the international scientific community in informatics research and training
through the scientific visitors programs.
Link: https://www.ncbi.nlm.nih.gov/
 In 1992, NCBI has the responsibility for making available the
DNA sequence database to the GenBank.
 Coordinates with individual laboratories and other sequence
data base such those of EMBL and DDBJ.
 Moreover, NCBI has grown to provide other databases in
addition to GenBank.
 GenBank is a comprehensive sequence database that contains
publicly available DNA sequences for more than 1,19,000
different organisms obtained through the submission of
sequence data from individual lab and batch submissions from
large-scale of seq. projects.
 Daily data exchange with the EMBL data library in the UK and
the DNA Data Bank of Japan helps world wide coverage.
 Developed and maintained by European Molecular Biology Laboratory – European
Bioinformatics Institute (EMBL-EBI).
 Comprehensive data nucleotide sequence information.
 The European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database is a
comprehensive collection of primary nucleotide sequences maintained at the European
Bioinformatics Institute (EBI).
 Link: http:www.ebi.ac.uk/embl/
EMBL is supported by 22 member states, four prospect, and two associated states.
 The laboratory operatory operates from five sites: the main laboratory in Heidelberg, and
outstations Hinxton (EBI, in England), Grenoble (France), Hambury (Germany) and
Manterotando ( near Rome).
 EMBL groups and laboratories perform basic research in molecular biology and molecular
medicine as well as training for science student and visitors.
 Since 1982 this work has been done in collaboration with GenBank (NCBI, Bethesda, USA)
and the DNA Database of Japan (Mishima).
 For sequencing similar searching, a variety of tools (FASTA and BLAST
are available that allow external users to compare their own seq. against the data in
EMBL nucleotide sequence database and other database.
 The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA
sequences. It was established in 1986.
 Link: https://www.ddbj.nig.ac.jp
 It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of
Japan.
 DDBJ is a member of the International Nucleotide Sequence Database
Collaboration or INSDC.
 It exchanges its data with European Molecular Biology Laboratory at the European
Bioinformatics Institute and with GenBank at the National Center for Biotechnology
Information on a daily basis.
 DDBJ Center collects nucleotide sequence data as a member of INSDC(International
Nucleotide Sequence Database Collaboration) and provides freely available nucleotide
sequence data and supercomputer system, to support research activities in life science.
 FEATURES
 group 1: biological source of the sequence (source) The feature, “source” (group 1) is
mandatory for all entries in the international nucleotide database. ...
 group 2: biological function features of the region. ...
 group 3: difference and/or change of the sequence data.
Data type Organism Accession numbers for annotated
sequences (number of entries)
Accession numbers for raw reads
Genome Radish (Raphanus sativus cv. Aokubi S-
h)
WGS: BAOO01000001-
BAOO01072909 (72 909 entries)
scaffold CON: DF196826-
DF236948 (40,123 entries)
DRR012610-DRR012624
Soybean (Glycine max cv. Enrei) BBNX02000001-BBNX02108601 (108
601 entries)
DRR021740-DRR021744
Common marmoset (Callithrix jacchus) WGS: BBXK01000001-
BBXK01109198 (109 198 entries)
scaffold CON: DG000097-
DG000120 (24 entries)
GSS: LB274659-LB427105 (152 447
entries)
DRR036754-DRR036764
List of notable data sets released from the DNA Data Bank of Japan (DDBJ) sequence databases from June 2015 to May 2016
 Hosted at National Institute of Genetics .
 Mainly from scientists in Japan and also from resources all over the world and shave this
nucleotide data with EMBL and GenBank.
 This officially , certified to collect nucleotide sequence from researchers sand to tissue the
internationally recognized number of data submitters.
 About 99% of the nucleotide data in INSDC are submitted by DDMJ
 This database plays a major role to improve the quality of INSDC.
 Each database entry include details of sequences, submitters details bibiliographic
references, biological significance and the scientific name and taxonomy of the organism.
 Features that identify coding regions transcription units, mutation sites etc. are displayed
in a feature table. Major activities of the database.
 Providing internationally recognized accession numbers to sequences.
 Bioinformatics database management developing tools for the analysis and visualization of
biological data.
 Conducting courses for beginners to reduce the complexity in the biological data analysis.
Primary Databases.pptx
Primary Databases.pptx

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptx
 
Ddbj
DdbjDdbj
Ddbj
 
Structural databases
Structural databases Structural databases
Structural databases
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
History and scope in bioinformatics
History and scope in bioinformaticsHistory and scope in bioinformatics
History and scope in bioinformatics
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Fasta
FastaFasta
Fasta
 
EMBL
EMBLEMBL
EMBL
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 

Similar a Primary Databases.pptx

Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
PagudalaSangeetha
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 

Similar a Primary Databases.pptx (20)

Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Biological database
Biological databaseBiological database
Biological database
 
Nucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptxNucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptx
 
Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
databases.pptx
databases.pptxdatabases.pptx
databases.pptx
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Primary sequencing of nucleic acids
Primary sequencing of nucleic acidsPrimary sequencing of nucleic acids
Primary sequencing of nucleic acids
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 

Último

SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Último (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 

Primary Databases.pptx

  • 1. Presented by – SWARUP MALAKAR
  • 2. A database is a repository of sequence ( DNA or amino acids ) stored in a computer which provide a centralized and homogenous view of its content. or, it is a vast collection of data pertaining to a specific topic, e.g., nucleotide sequence, protein sequence etc. Basically, it is an electronic environment. Databases are at the heart of bioinformatics.
  • 3. 1. Sequence databases: - that involves the sequences of both proteins and nucleic acids. 2. Structural databases:- that involves only protein databases. In additionally, it is also classified into three categories: A. Primary database B. Secondary databases C. Composite databases.
  • 4. It contain information of the sequence or structure alone either protein or nucleic acid . Example: PIR, SWISS-PROT for protein sequences , NCBI, EMBL and DDBJ for genome sequences.
  • 5. PIR: It is functionally annotated protein sequences and structure. PIR has collaborated with EBI and SIB to establish the UniProt ( United Protein Databases). The central resource of protein sequence and function.
  • 7. NCBI ( National Centre of Biotechnology Information ): - Nov 4, 1988 , the NCBI was established as division of the National Library of medicine for the development of information systems in molecular biology. - The NCBI is located in Bethesta, Maryland (U.S.A). - NCBI built the GenBank, which is an annotated collection of publically available nucleotide and protein sequences. - In 1988, the three partners (DDBJ, EMBL and GenBank) of the international Nucelotide Sequences Database collaboration had a meeting and agreed to use a common format.
  • 8. i. Maintains collaboration with several NIH institutes, academia, industry and other governmental agencies. ii. Develops, distributes, supports and coordinates access to a variety of databases and software for the scientific and medical communities. iii. Develops and promotes standards for databases, data deposition and exchange, and biological nomenclature. iv. Engages the members of the international scientific community in informatics research and training through the scientific visitors programs. Link: https://www.ncbi.nlm.nih.gov/
  • 9.  In 1992, NCBI has the responsibility for making available the DNA sequence database to the GenBank.  Coordinates with individual laboratories and other sequence data base such those of EMBL and DDBJ.  Moreover, NCBI has grown to provide other databases in addition to GenBank.  GenBank is a comprehensive sequence database that contains publicly available DNA sequences for more than 1,19,000 different organisms obtained through the submission of sequence data from individual lab and batch submissions from large-scale of seq. projects.  Daily data exchange with the EMBL data library in the UK and the DNA Data Bank of Japan helps world wide coverage.
  • 10.  Developed and maintained by European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI).  Comprehensive data nucleotide sequence information.
  • 11.  The European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database is a comprehensive collection of primary nucleotide sequences maintained at the European Bioinformatics Institute (EBI).  Link: http:www.ebi.ac.uk/embl/ EMBL is supported by 22 member states, four prospect, and two associated states.  The laboratory operatory operates from five sites: the main laboratory in Heidelberg, and outstations Hinxton (EBI, in England), Grenoble (France), Hambury (Germany) and Manterotando ( near Rome).
  • 12.  EMBL groups and laboratories perform basic research in molecular biology and molecular medicine as well as training for science student and visitors.  Since 1982 this work has been done in collaboration with GenBank (NCBI, Bethesda, USA) and the DNA Database of Japan (Mishima).  For sequencing similar searching, a variety of tools (FASTA and BLAST are available that allow external users to compare their own seq. against the data in EMBL nucleotide sequence database and other database.
  • 13.  The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It was established in 1986.  Link: https://www.ddbj.nig.ac.jp  It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan.  DDBJ is a member of the International Nucleotide Sequence Database Collaboration or INSDC.  It exchanges its data with European Molecular Biology Laboratory at the European Bioinformatics Institute and with GenBank at the National Center for Biotechnology Information on a daily basis.
  • 14.  DDBJ Center collects nucleotide sequence data as a member of INSDC(International Nucleotide Sequence Database Collaboration) and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science.  FEATURES  group 1: biological source of the sequence (source) The feature, “source” (group 1) is mandatory for all entries in the international nucleotide database. ...  group 2: biological function features of the region. ...  group 3: difference and/or change of the sequence data.
  • 15. Data type Organism Accession numbers for annotated sequences (number of entries) Accession numbers for raw reads Genome Radish (Raphanus sativus cv. Aokubi S- h) WGS: BAOO01000001- BAOO01072909 (72 909 entries) scaffold CON: DF196826- DF236948 (40,123 entries) DRR012610-DRR012624 Soybean (Glycine max cv. Enrei) BBNX02000001-BBNX02108601 (108 601 entries) DRR021740-DRR021744 Common marmoset (Callithrix jacchus) WGS: BBXK01000001- BBXK01109198 (109 198 entries) scaffold CON: DG000097- DG000120 (24 entries) GSS: LB274659-LB427105 (152 447 entries) DRR036754-DRR036764 List of notable data sets released from the DNA Data Bank of Japan (DDBJ) sequence databases from June 2015 to May 2016
  • 16.  Hosted at National Institute of Genetics .  Mainly from scientists in Japan and also from resources all over the world and shave this nucleotide data with EMBL and GenBank.  This officially , certified to collect nucleotide sequence from researchers sand to tissue the internationally recognized number of data submitters.  About 99% of the nucleotide data in INSDC are submitted by DDMJ  This database plays a major role to improve the quality of INSDC.  Each database entry include details of sequences, submitters details bibiliographic references, biological significance and the scientific name and taxonomy of the organism.
  • 17.  Features that identify coding regions transcription units, mutation sites etc. are displayed in a feature table. Major activities of the database.  Providing internationally recognized accession numbers to sequences.  Bioinformatics database management developing tools for the analysis and visualization of biological data.  Conducting courses for beginners to reduce the complexity in the biological data analysis.