SlideShare una empresa de Scribd logo
1 de 50
Databases
• A data structure that stores organized
information. Most databases contain multiple
tables, which may each include several different
fields.
• A database-management system (DBMS) is a
computer-software application that interacts with
end-users, other applications, and the database
itself to capture and analyze data. A general-
purpose DBMS allows the definition, creation,
querying, update, and administration of
databases.
Biological databases
• Libraries of life sciences information, collected from
scientific experiments, published literature, high-
throughput experiment technology, and computational
analysis.They contain information from research areas
including genomics, proteomics, metabolomics,
microarray gene expression, and phylogentics.
Information contained in biological databases includes
gene function, structure, localization (both cellular and
chromosomal), clinical effects of mutations as well as
similarities of biological sequences and structures.
• Biological databases can be broadly classified into
sequence, structure and functional databases.
Biological databases
• Contains files or tables, each containing
numerous records and fields
• Simplest form, either a large single text file
or collection of text files
• Commonest type, stores the data within a
number of tables (with records and fields).
Each table will link each other by a shared
file called a key
Bibliography
Flat file
Relational database model
The operators are written in query-specific languages based on relational algebra
Structured Query Language (SQL) is commonly used
• XML (eXtensible Markup Language) is now a general tool for
storage of data and information. HTML and XHTML are subsets of XML.
• The key feature is to use identifiers called tabs
• <title> Understanding Bioinformatics </ title>
• <publisher> tag can be defined and used to identify book publishers
• Extraction from XML file is similar to database querying.
Databases
Information system
Query system
Storage System
Data
GenBank flat file
PDB file
Interaction Record
Title of a book
Book
Databases
Information system
Query system
Storage System
Data
Boxes
Oracle
MySQL
PC binary files
Unix text files
Bookshelves
Databases
Information system
Query system
Storage System
Data
A List you look at
A catalogue
indexed files
SQL
grep
The UBC library
Google
Entrez
SRS
Databases
Information system
Query system
Storage System
Data
Bioinformatics Information Space
July 17, 1999
• Nucleotide sequences: 4,456,822
• Protein sequences: 706,862
• 3D structures: 9,780
• Human Unigene Clusters: 75,832
• Maps and Complete Genomes: 10,870
• Different species node: 52,889
• dbSNP 6,377
• RefGenes 515
• human contigs > 250 kb 341
(4.9MB)
• PubMed records: 10,372,886
• OMIM records: 10,695
The challenge of the information space:
Nucleotide records 36,653,899
Protein sequences 4,436,362
3D structures 19,640
Interactions & complexes 52,385
Human Unigene Cluster 118,517
Maps and Complete Genomes 6,948
Different taxonomy Nodes 283,121
Human dbSNP 13,179,601
Human RefSeq records 22,079
bp in Human Contigs > 5,000 kb (116) 2,487,920,000
PubMed records 12,570,540
OMIM records 15,138
Feb 10 2004
Databases
• Primary (archival)
– GenBank/EMBL/DDBJ
– UniProt
– PDB
– Medline (PubMed)
– BIND
• Secondary (curated)
– RefSeq
– Taxon
– UniProt
– OMIM
– SGD
http://nar.oupjournals.org/content/vol31/issue1/
Tools of trade
for the “armchair scientist”
• Databases
– PubMed and other NCBI databases
– Biochemical databases
– Protein domain databases
– Structural databases
– Genome comparison databases
• Tools
– CDD / COGs
– VAST / FSSP
Distribution of the type of databases as classified at the NAR
database web site
Types of databases
• Archival or Primary Data
– Text: PubMed
– DNA Sequence: GenBank
– Protein Sequence: Entrez Proteins, TREMBL
– Protein Structures: PDB
• Curated or Processed Data
– DNA sequences : RefSeq, LocusLink, OMIM
– Protein Sequences: SWISS-PROT, PIR
– Protein Structures : SCOP, CATH, MMDB
– Genomes: Entrez Genomes, COGs
Nucleic Acids Research: Database Issue each January 1 Articles on
~100 different databases
The National Center for Biotechnology Information (NCBI)
• Created as a part of the National Library of Medicine,
National Institutes of Health in 1988
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
– Disseminate biomedical information
• Tools: BLAST(1990), Entrez (1992)
• GenBank (1992)
• Free MEDLINE (PubMed, 1997)
• Other databases: dbEST, dbGSS, dbSTS,
MMDB, OMIM, UniGene, Taxonomy,
GeneMap, SAGE, LocusLink, RefSeq
What is GenBank?
• Archival nucleotide sequence database
• Sample slogans:
“Easy deposits, unlimited withdrawals, high
interest”, “All bases covered”, “Billions and billions
served”
• Data are shared nightly among three
collaborating databases:
• GenBank at NCBI - Bethesda, Maryland, USA
• DNA Database of Japan (DDBJ) at NIG -
Mishima, Japan
• European Molecular Biology Laboratory
Database (EMBL) at EBI - Hinxton, UK
Some guiding principles of working
with GenBank
• GenBank is a nucleotide-centric
view of the information space
• GenBank is a repository of all
publically available sequences
• In GenBank, records are grouped
for various reasons
• Data in GenBank is only as good
as what you put in
NCBI databases and their links
Word Weight
VAST
BLAST
BLAST
Phylogeny
Genomes
Taxonomy
Nucleotide
Sequences
Protein
Sequences
Article
Abstracts
Medline
3-D
Structure
3 D
Structure
MMDB
PDB
• Protein DataBase
– Protein and NA
3D structures
– Sequence
present
– YAFFF
HEADER LEUCINE ZIPPER 15-JUL-
93 1DGC 1DGC 2
COMPND GCN4 LEUCINE ZIPPER COMPLEXED WITH SPECIFIC
1DGC 3
COMPND 2 ATF/CREB SITE DNA
1DGC 4
SOURCE GCN4: YEAST (SACCHAROMYCES CEREVISIAE); DNA:
SYNTHETIC 1DGC 5
AUTHOR T.J.RICHMOND
1DGC 6
REVDAT 1 22-JUN-94 1DGC 0
1DGC 7
JRNL AUTH P.KONIG,T.J.RICHMOND
1DGC 8
JRNL TITL THE X-RAY STRUCTURE OF THE GCN4-BZIP
BOUND TO 1DGC 9
JRNL TITL 2 ATF/CREB SITE DNA SHOWS THE COMPLEX
DEPENDS ON DNA 1DGC 10
JRNL TITL 3 FLEXIBILITY
1DGC 11
JRNL REF J.MOL.BIOL. V. 233
139 1993 1DGC 12
JRNL REFN ASTM JMOBAK UK ISSN 0022-2836
0070 1DGC 13
REMARK 1
1DGC 14
REMARK 2
1DGC 15
REMARK 2 RESOLUTION. 3.0 ANGSTROMS.
1DGC 16
REMARK 3
1DGC 17
REMARK 3 REFINEMENT.
1DGC 18
REMARK 3 PROGRAM X-PLOR
1DGC 19
REMARK 3 AUTHORS BRUNGER
1DGC 20
REMARK 3 R VALUE 0.216
1DGC 21
REMARK 3 RMSD BOND DISTANCES 0.020 ANGSTROMS
1DGC 22
REMARK 3 RMSD BOND ANGLES 3.86 DEGREES
1DGC 23
PDB
• HEADER
• COMPND
• SOURCE
• AUTHOR
• DATE
• JRNL
• REMARK
• SECRES
• ATOM COORDINATES
Accessing
information on
molecular sequences
Page 26
[rest of protein sequence deleted for brevity]
[rest of nucleotide sequence deleted for brevity]
GenBank Record
Accession Number
gi Number
Protein Sequence
Nucleotide Sequence
Locus Name
Medline ID
GenPept ID
LOCUS, Accession, NID and protein_id
LOCUS: Unique string of 10 letters and numbers in the database. Not
maintained amongst databases, and is therefore a poor sequence
identifier.
ACCESSION: A unique identifier to that record, citable entity; does not
change when record is updated. A good record identifier, ideal for citation
in publication.
VERSION: : New system where the accession and version play the same
function as the accession and gi number.
Nucleotide gi: Geninfo identifier (gi), a unique integer which will change
every time the sequence changes.
PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two
on one CDS.
Protein gi: Geninfo identifier (gi), a unique integer which will change every
time the sequence changes.
Protein_id: Identifier which has the same structure and function as the
nucleotide Accession.version numbers, but slightlt different format.
Protein sequence motif
is a descriptor of a protein family
• Glutamine amidotransferase class I
[PAS]-[LIVMFYT]-[LIVMFY]-G-[LIVMFY]-C-
[LIVMFYN]-G-x-[QEH]- x-[LIVMFA]
[C is the active site residue]
• Glutamine amidotransferase class II
<x(0,11)-C-[GS]-[IV]-[LIVMFYW]-[AG]
[C is the active site residue]
Searching MMDB
Principles of structural alignment
• Dali: http://www.ebi.ac.uk/dali/
Looks for minimal RMSD between Ca atoms.
Calculate Ca - Ca distance matrices, then
identifies the longest alignable segments
• VAST (Vector Alignment Search Tool)
http://www.ncbi.nlm.nih.gov/Structure/
looks for pairs of secondary structure
elements (a-helices, b-strands) that have
similar orientation and connectivity
Dali alignment of Tyr phosphatase
VAST Structure Neighbors
Structure Summary
Cn3D viewer
VAST neighbors
BLAST neighbors
Cn3D : Displaying Structures
Chloroquine
Structure Neighbors
Use of structural alignments
Chloroquine
NADH
PDB
• Protein DataBase
– Protein and NA
3D structures
– Sequence
present
– YAFFF
UniProt
• New protein sequence database that is the result of a
merge from SWISS-PROT and PIR. It will be the
annotated curated protein sequence database.
• Data in UniProt is primarily derived from coding
sequence annotations in EMBL (GenBank/DDBJ) nucleic
acid sequence data.
• UniProt is a Flat-File database just like EMBL and
GenBank
• Flat-File format is SwissProt-like, or EMBL-like
Swiss-Prot
• SWISS-PROT incorporates:
•Function of the protein
•Post-translational modification
•Domains and sites.
•Secondary structure.
•Quaternary structure.
•Similarities to other proteins;
•Diseases associated with deficiencies in the
protein
•Sequence conflicts, variants, etc.
Swiss-Prot

Más contenido relacionado

Similar a Databases_CSS2.pptx

100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databasesMeetika Gupta
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsMohamedHasan816582
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2Razzaqe
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptNaglaaFathy42
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcAdiM27
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2Razzaqe
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics finalRainu Rajeev
 
Bda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databasesBda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databasesInterpretOmics
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary databaseKAUSHAL SAHU
 
Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information NahalMalik1
 

Similar a Databases_CSS2.pptx (20)

100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Bioinformatica t2-databases
Bioinformatica t2-databasesBioinformatica t2-databases
Bioinformatica t2-databases
 
Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt Bioinformatics
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.ppt
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzc
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
PDF文档.pdf
PDF文档.pdfPDF文档.pdf
PDF文档.pdf
 
Genomic Databases-.pptx
Genomic Databases-.pptxGenomic Databases-.pptx
Genomic Databases-.pptx
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Bda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databasesBda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databases
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 

Más de Silpa87

PHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptxPHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptxSilpa87
 
Violation of publication ethics.pptx
Violation of publication ethics.pptxViolation of publication ethics.pptx
Violation of publication ethics.pptxSilpa87
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxSilpa87
 
New Microsoft PowerPoint Presentation1.pptx
New Microsoft PowerPoint Presentation1.pptxNew Microsoft PowerPoint Presentation1.pptx
New Microsoft PowerPoint Presentation1.pptxSilpa87
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxSilpa87
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxSilpa87
 
AGROTECHNOLOGY.pptx
AGROTECHNOLOGY.pptxAGROTECHNOLOGY.pptx
AGROTECHNOLOGY.pptxSilpa87
 
Modern chromatographic technique.pptx
Modern chromatographic technique.pptxModern chromatographic technique.pptx
Modern chromatographic technique.pptxSilpa87
 
NUTRIGENOMICS 1.3.20.pptx
NUTRIGENOMICS 1.3.20.pptxNUTRIGENOMICS 1.3.20.pptx
NUTRIGENOMICS 1.3.20.pptxSilpa87
 
Genomics_final.pptx
Genomics_final.pptxGenomics_final.pptx
Genomics_final.pptxSilpa87
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.pptSilpa87
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptSilpa87
 
Agrotechnology.pptx
Agrotechnology.pptxAgrotechnology.pptx
Agrotechnology.pptxSilpa87
 

Más de Silpa87 (14)

PHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptxPHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptx
 
Violation of publication ethics.pptx
Violation of publication ethics.pptxViolation of publication ethics.pptx
Violation of publication ethics.pptx
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
 
New Microsoft PowerPoint Presentation1.pptx
New Microsoft PowerPoint Presentation1.pptxNew Microsoft PowerPoint Presentation1.pptx
New Microsoft PowerPoint Presentation1.pptx
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
 
AGROTECHNOLOGY.pptx
AGROTECHNOLOGY.pptxAGROTECHNOLOGY.pptx
AGROTECHNOLOGY.pptx
 
Modern chromatographic technique.pptx
Modern chromatographic technique.pptxModern chromatographic technique.pptx
Modern chromatographic technique.pptx
 
NUTRIGENOMICS 1.3.20.pptx
NUTRIGENOMICS 1.3.20.pptxNUTRIGENOMICS 1.3.20.pptx
NUTRIGENOMICS 1.3.20.pptx
 
Genomics_final.pptx
Genomics_final.pptxGenomics_final.pptx
Genomics_final.pptx
 
HGP.ppt
HGP.pptHGP.ppt
HGP.ppt
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.ppt
 
Agrotechnology.pptx
Agrotechnology.pptxAgrotechnology.pptx
Agrotechnology.pptx
 

Último

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 

Último (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 

Databases_CSS2.pptx

  • 1. Databases • A data structure that stores organized information. Most databases contain multiple tables, which may each include several different fields. • A database-management system (DBMS) is a computer-software application that interacts with end-users, other applications, and the database itself to capture and analyze data. A general- purpose DBMS allows the definition, creation, querying, update, and administration of databases.
  • 2. Biological databases • Libraries of life sciences information, collected from scientific experiments, published literature, high- throughput experiment technology, and computational analysis.They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogentics. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. • Biological databases can be broadly classified into sequence, structure and functional databases.
  • 3. Biological databases • Contains files or tables, each containing numerous records and fields • Simplest form, either a large single text file or collection of text files • Commonest type, stores the data within a number of tables (with records and fields). Each table will link each other by a shared file called a key
  • 5.
  • 6. Flat file Relational database model The operators are written in query-specific languages based on relational algebra Structured Query Language (SQL) is commonly used
  • 7. • XML (eXtensible Markup Language) is now a general tool for storage of data and information. HTML and XHTML are subsets of XML. • The key feature is to use identifiers called tabs • <title> Understanding Bioinformatics </ title> • <publisher> tag can be defined and used to identify book publishers • Extraction from XML file is similar to database querying.
  • 8. Databases Information system Query system Storage System Data GenBank flat file PDB file Interaction Record Title of a book Book
  • 9. Databases Information system Query system Storage System Data Boxes Oracle MySQL PC binary files Unix text files Bookshelves
  • 10. Databases Information system Query system Storage System Data A List you look at A catalogue indexed files SQL grep
  • 11. The UBC library Google Entrez SRS Databases Information system Query system Storage System Data
  • 12. Bioinformatics Information Space July 17, 1999 • Nucleotide sequences: 4,456,822 • Protein sequences: 706,862 • 3D structures: 9,780 • Human Unigene Clusters: 75,832 • Maps and Complete Genomes: 10,870 • Different species node: 52,889 • dbSNP 6,377 • RefGenes 515 • human contigs > 250 kb 341 (4.9MB) • PubMed records: 10,372,886 • OMIM records: 10,695
  • 13. The challenge of the information space: Nucleotide records 36,653,899 Protein sequences 4,436,362 3D structures 19,640 Interactions & complexes 52,385 Human Unigene Cluster 118,517 Maps and Complete Genomes 6,948 Different taxonomy Nodes 283,121 Human dbSNP 13,179,601 Human RefSeq records 22,079 bp in Human Contigs > 5,000 kb (116) 2,487,920,000 PubMed records 12,570,540 OMIM records 15,138 Feb 10 2004
  • 14. Databases • Primary (archival) – GenBank/EMBL/DDBJ – UniProt – PDB – Medline (PubMed) – BIND • Secondary (curated) – RefSeq – Taxon – UniProt – OMIM – SGD
  • 16. Tools of trade for the “armchair scientist” • Databases – PubMed and other NCBI databases – Biochemical databases – Protein domain databases – Structural databases – Genome comparison databases • Tools – CDD / COGs – VAST / FSSP
  • 17. Distribution of the type of databases as classified at the NAR database web site
  • 18.
  • 19. Types of databases • Archival or Primary Data – Text: PubMed – DNA Sequence: GenBank – Protein Sequence: Entrez Proteins, TREMBL – Protein Structures: PDB • Curated or Processed Data – DNA sequences : RefSeq, LocusLink, OMIM – Protein Sequences: SWISS-PROT, PIR – Protein Structures : SCOP, CATH, MMDB – Genomes: Entrez Genomes, COGs Nucleic Acids Research: Database Issue each January 1 Articles on ~100 different databases
  • 20. The National Center for Biotechnology Information (NCBI) • Created as a part of the National Library of Medicine, National Institutes of Health in 1988 – Establish public databases – Research in computational biology – Develop software tools for sequence analysis – Disseminate biomedical information • Tools: BLAST(1990), Entrez (1992) • GenBank (1992) • Free MEDLINE (PubMed, 1997) • Other databases: dbEST, dbGSS, dbSTS, MMDB, OMIM, UniGene, Taxonomy, GeneMap, SAGE, LocusLink, RefSeq
  • 21. What is GenBank? • Archival nucleotide sequence database • Sample slogans: “Easy deposits, unlimited withdrawals, high interest”, “All bases covered”, “Billions and billions served” • Data are shared nightly among three collaborating databases: • GenBank at NCBI - Bethesda, Maryland, USA • DNA Database of Japan (DDBJ) at NIG - Mishima, Japan • European Molecular Biology Laboratory Database (EMBL) at EBI - Hinxton, UK
  • 22. Some guiding principles of working with GenBank • GenBank is a nucleotide-centric view of the information space • GenBank is a repository of all publically available sequences • In GenBank, records are grouped for various reasons • Data in GenBank is only as good as what you put in
  • 23. NCBI databases and their links Word Weight VAST BLAST BLAST Phylogeny Genomes Taxonomy Nucleotide Sequences Protein Sequences Article Abstracts Medline 3-D Structure 3 D Structure MMDB
  • 24. PDB • Protein DataBase – Protein and NA 3D structures – Sequence present – YAFFF
  • 25. HEADER LEUCINE ZIPPER 15-JUL- 93 1DGC 1DGC 2 COMPND GCN4 LEUCINE ZIPPER COMPLEXED WITH SPECIFIC 1DGC 3 COMPND 2 ATF/CREB SITE DNA 1DGC 4 SOURCE GCN4: YEAST (SACCHAROMYCES CEREVISIAE); DNA: SYNTHETIC 1DGC 5 AUTHOR T.J.RICHMOND 1DGC 6 REVDAT 1 22-JUN-94 1DGC 0 1DGC 7 JRNL AUTH P.KONIG,T.J.RICHMOND 1DGC 8 JRNL TITL THE X-RAY STRUCTURE OF THE GCN4-BZIP BOUND TO 1DGC 9 JRNL TITL 2 ATF/CREB SITE DNA SHOWS THE COMPLEX DEPENDS ON DNA 1DGC 10 JRNL TITL 3 FLEXIBILITY 1DGC 11 JRNL REF J.MOL.BIOL. V. 233 139 1993 1DGC 12 JRNL REFN ASTM JMOBAK UK ISSN 0022-2836 0070 1DGC 13 REMARK 1 1DGC 14 REMARK 2 1DGC 15 REMARK 2 RESOLUTION. 3.0 ANGSTROMS. 1DGC 16 REMARK 3 1DGC 17 REMARK 3 REFINEMENT. 1DGC 18 REMARK 3 PROGRAM X-PLOR 1DGC 19 REMARK 3 AUTHORS BRUNGER 1DGC 20 REMARK 3 R VALUE 0.216 1DGC 21 REMARK 3 RMSD BOND DISTANCES 0.020 ANGSTROMS 1DGC 22 REMARK 3 RMSD BOND ANGLES 3.86 DEGREES 1DGC 23 PDB • HEADER • COMPND • SOURCE • AUTHOR • DATE • JRNL • REMARK • SECRES • ATOM COORDINATES
  • 27. [rest of protein sequence deleted for brevity] [rest of nucleotide sequence deleted for brevity] GenBank Record Accession Number gi Number Protein Sequence Nucleotide Sequence Locus Name Medline ID GenPept ID
  • 28. LOCUS, Accession, NID and protein_id LOCUS: Unique string of 10 letters and numbers in the database. Not maintained amongst databases, and is therefore a poor sequence identifier. ACCESSION: A unique identifier to that record, citable entity; does not change when record is updated. A good record identifier, ideal for citation in publication. VERSION: : New system where the accession and version play the same function as the accession and gi number. Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes. PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one CDS. Protein gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes. Protein_id: Identifier which has the same structure and function as the nucleotide Accession.version numbers, but slightlt different format.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. Protein sequence motif is a descriptor of a protein family • Glutamine amidotransferase class I [PAS]-[LIVMFYT]-[LIVMFY]-G-[LIVMFY]-C- [LIVMFYN]-G-x-[QEH]- x-[LIVMFA] [C is the active site residue] • Glutamine amidotransferase class II <x(0,11)-C-[GS]-[IV]-[LIVMFYW]-[AG] [C is the active site residue]
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 40. Principles of structural alignment • Dali: http://www.ebi.ac.uk/dali/ Looks for minimal RMSD between Ca atoms. Calculate Ca - Ca distance matrices, then identifies the longest alignable segments • VAST (Vector Alignment Search Tool) http://www.ncbi.nlm.nih.gov/Structure/ looks for pairs of secondary structure elements (a-helices, b-strands) that have similar orientation and connectivity
  • 41. Dali alignment of Tyr phosphatase
  • 43. Structure Summary Cn3D viewer VAST neighbors BLAST neighbors
  • 44. Cn3D : Displaying Structures Chloroquine
  • 46. Use of structural alignments Chloroquine NADH
  • 47. PDB • Protein DataBase – Protein and NA 3D structures – Sequence present – YAFFF
  • 48. UniProt • New protein sequence database that is the result of a merge from SWISS-PROT and PIR. It will be the annotated curated protein sequence database. • Data in UniProt is primarily derived from coding sequence annotations in EMBL (GenBank/DDBJ) nucleic acid sequence data. • UniProt is a Flat-File database just like EMBL and GenBank • Flat-File format is SwissProt-like, or EMBL-like
  • 50. • SWISS-PROT incorporates: •Function of the protein •Post-translational modification •Domains and sites. •Secondary structure. •Quaternary structure. •Similarities to other proteins; •Diseases associated with deficiencies in the protein •Sequence conflicts, variants, etc. Swiss-Prot