Entrez databases

Hafiz Muhammad Zeeshan Raza
Hafiz Muhammad Zeeshan RazaResearch Associate en COMSATS University Islamabad, Sahiwal Campus
Entrez Databases
Hafiz.M.Zeeshan.Raza
Research Assistant_HEC_NRPU
hafizraza26@gamil.com
COMSATS UNIVERSITY SAHIWAL
Overview
• Introduction
• Operations
• Interlinked resources
Introduction
• A major goal in developing databases is to provide efficient
and user friendly access to the data stored.
• There are a number of retrieval systems for biological data.
• The most popular retrieval systems for biological databases
are Entrez and Sequence Retrieval Systems (SRS) that
provide access to multiple databases for retrieval of
integrated search results.
Continue…
• To perform complex queries in a database often requires the use of
Boolean operators.
• This is to join a series of keywords using logical terms such as AND,
OR, and NOT to indicate relationships between the keywords used
in a search.
• AND means that the search result must contain both words; OR
means to search for results containing either word or both; NOT
excludes results containing either one of the words.
Continue…
• In addition, one can use parentheses ( ) to define a
concept if multiple words and relationships are
involved, so that the computer knows which part of the
search to execute first.
• Items contained within parentheses are executed first.
Quotes can be used to specify a phrase. Most search
engines of public biological databases use some form
of this Boolean logic.
Entrez
• The NCBI developed and maintains Entrez, a biological database retrieval
system.
• It is a gateway that allows text-based searches for a wide variety of data,
including annotated genetic sequence information, structural information,
as well as citations and abstracts, full papers, and taxonomic data.
• The key feature of Entrez is its ability to integrate information, which
comes from cross-referencing between NCBI databases based on
preexisting and logical relationships between individual entries.
Continue…
• This is highly convenient: users do not have to visit
multiple databases located in disparate places.
• For example, in a nucleotide sequence page, one may
find cross-referencing links to the translated protein
sequence, genome mapping data, or to the related
PubMed literature information, and to protein
structures if available.
Operation of Entrez
• Effective use of Entrez requires an understanding of the main
features of the search engine.
• There are several options common to all NCBI databases that help
to narrow the search.
• One option is “Limits,” which helps to restrict the search to a subset
of a particular database.
• It can also be set to restrict a search to a particular database (e.g.,
the field for author or publication date) or a particular type of data
(e.g., chloroplast DNA/RNA).
Continue…
• Another option is “Preview/Index,” which connects different searches with the
Boolean operators and uses a string of logically connected keywords to perform a
new search.
• The search can also be limited to a particular search field (e.g., gene name or
accession number).
• The “History” option provides a record of the previous searches so that the user
can review, revise, or combine the results of earlier searches.
• There is also a “Clipboard” that stores search results for later viewing for a limited
time. To store information in the Clipboard, the “Send to Clipboard” function
should be used.
Home Page of Entrez
PubMed to Entrez
• One of the databases accessible from Entrez is a biomedical literature
database known as PubMed, which contains abstracts and in some cases
the full text articles from nearly 4,000 journals.
• An important feature of PubMed is the retrieval of information based on
medical subject headings (MeSH) terms.
• The MeSH system consists of a collection of more than 20,000 controlled
and standardized vocabulary terms used for indexing articles.
• In other words, it is a thesaurus that helps convert search keywords into
standardized terms to describe a concept.
Continue…
• By doing so, it allows “smart” searches in which a group of accepted
synonyms are employed so that the user not only gets
exact matches, but also related matches on the same topic that otherwise
might have been missed.
• Another way to broaden the retrieval is by using the “Related Articles”
option. PubMed uses a word weight algorithm to identify related articles
with similar words in the titles, abstracts, and MeSH.
• By using this feature, articles on the same topic that were missed in the
original search can be retrieved.
Entrez databases
Entrez databases
Entrez databases
OMIM to Entrez
• Another unique database accessible from Entrez is Online Mendelian
Inheritance in Man (OMIM), which is a non-sequence-based database of
human disease genes and human genetic disorders.
• Each entry in OMIM contains summary information about
a particular disease as well as genes related to the disease.
• The text contains numerous hyperlinks to literature citations, primary
sequence records, as well as chromosome loci of the disease genes. The
database can serve as an excellent starting point to study
genes related to a disease.
OMIM Home Page
Sequence retrieval system (SRS)
• Sequence retrieval system is a retrieval system maintained by the
EBI, which is comparable to NCBI Entrez.
• It is not as integrated as Entrez, but allows the user to query
multiple databases simultaneously, another good example of
database integration.
• It also offers direct access to certain sequence analysis applications
such as sequence similarity searching and Clustal sequence
alignment.
Continue…
• Queries can be launched using “Quick Text Search” with only one query box in
which to enter information.
• There are also more elaborate submission forms, the “Standard Query Form” and
the “Extended Query Form.”
• The standard form allows four criteria (fields) to be used, which are linked by
Boolean operators.
• The extended form allows many more diversified criteria and fields to be used.
• The search results contain the query sequence and sequence annotation as well
as links to literature, metabolic pathways, and other biological databases.
Importance
• Databases are fundamental to modern biological research, especially to genomic
studies. The goal of a biological database is two fold: information retrieval and
knowledge discovery.
• Electronic databases can be constructed either as flat files, relational, or object
oriented. Flat files are simple text files and lack any form of organization to
facilitate information retrieval by computers.
• Relational databases organize data as tables and search information among tables
with shared features.
• Object-oriented databases organize data as objects and associate the objects
according to hierarchical relationships.
Continue…
• Biological databases need to be interconnected so that entries in one database can
be cross-linked to related entries in another database.
• NCBI databases accessible through Entrez are among the most integrated
databases. Effective information retrieval involves the use of Boolean operators.
• Entrez has additional user-friendly features to help conduct complex searches. One
such option is to use Limits, Preview/Index, and History to narrow down the
search space.
• Alternatively, one can use NCBI-specific field qualifiers to conduct searches. To
retrieve sequence information from NCBI GenBank, an understanding
of the format of GenBank sequence files is necessary.
Continue…
• It is also important to bear in mind that sequence data in these databases
are less than perfect.
• There are sequence and annotation errors. Biological databases are also
plagued by redundancy problems.
• There are various solutions to correct annotation and reduce redundancy,
for example, merging redundant sequences into a single entry or store
highly redundant sequences into a separate database.
Entrez databases
1 de 23

Recomendados

European molecular biology laboratory (EMBL) por
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)Hafiz Muhammad Zeeshan Raza
30K vistas22 diapositivas
Gen bank databases por
Gen bank databasesGen bank databases
Gen bank databasesHafiz Muhammad Zeeshan Raza
33.7K vistas17 diapositivas
Introduction to NCBI por
Introduction to NCBIIntroduction to NCBI
Introduction to NCBIgeetikaJethra
57.2K vistas31 diapositivas
Ddbj por
DdbjDdbj
DdbjBioinformatics15
31.4K vistas31 diapositivas

Más contenido relacionado

La actualidad más candente

Blast and fasta por
Blast and fastaBlast and fasta
Blast and fastaALLIENU
23.5K vistas27 diapositivas
Protein data bank por
Protein data bankProtein data bank
Protein data bankYogesh Joshi
3.4K vistas19 diapositivas
Cath por
CathCath
CathRamya S
17.9K vistas13 diapositivas
MULTIPLE SEQUENCE ALIGNMENT por
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTMariya Raju
80.2K vistas44 diapositivas
methods for protein structure prediction por
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure predictionkaramveer prajapat
66.5K vistas18 diapositivas
TrEMBL por
TrEMBLTrEMBL
TrEMBLAnkit Alankar
9K vistas28 diapositivas

La actualidad más candente(20)

Blast and fasta por ALLIENU
Blast and fastaBlast and fasta
Blast and fasta
ALLIENU23.5K vistas
Protein data bank por Yogesh Joshi
Protein data bankProtein data bank
Protein data bank
Yogesh Joshi3.4K vistas
Cath por Ramya S
CathCath
Cath
Ramya S17.9K vistas
MULTIPLE SEQUENCE ALIGNMENT por Mariya Raju
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
Mariya Raju80.2K vistas
methods for protein structure prediction por karamveer prajapat
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
karamveer prajapat66.5K vistas
Swiss prot database por sagrika chugh
Swiss prot databaseSwiss prot database
Swiss prot database
sagrika chugh27.7K vistas
Nucleic Acid Sequence databases por Pranavathiyani G
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
Pranavathiyani G64.8K vistas
sequence of file formats in bioinformatics por nadeem akhter
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
nadeem akhter25.9K vistas
Protein data bank por Alichy Sowmya
Protein data bankProtein data bank
Protein data bank
Alichy Sowmya33.4K vistas

Similar a Entrez databases

Data retreival system por
Data retreival systemData retreival system
Data retreival systemShikha Thakur
708 vistas54 diapositivas
Bioinformatics por
BioinformaticsBioinformatics
BioinformaticsShailendraSinghKhich
861 vistas22 diapositivas
Inverted files for text search engines por
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
2K vistas56 diapositivas
Review of literature por
Review of literatureReview of literature
Review of literatureDr Usha (Physio)
1.1K vistas45 diapositivas
K3 edith falk_discoverytoolslibrary por
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryevaminerva
496 vistas20 diapositivas
ENTREZ.ppt por
ENTREZ.pptENTREZ.ppt
ENTREZ.pptkishoreGupta17
392 vistas9 diapositivas

Similar a Entrez databases(20)

Inverted files for text search engines por unyil96
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
unyil962K vistas
K3 edith falk_discoverytoolslibrary por evaminerva
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibrary
evaminerva496 vistas
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY por Chris Okiki
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARYINFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY
INFORMATION SKILLS: NAVIGATING RESEARCH IN LIBRARY
Chris Okiki287 vistas
Federated to library discovery platfoms por Nikesh Narayanan
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
Nikesh Narayanan361 vistas
K3 edith falk_discoverytoolslibrary por evaminerva
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibrary
evaminerva486 vistas
Database and Research Matrix.pptx por RahulRoshan37
Database and Research Matrix.pptxDatabase and Research Matrix.pptx
Database and Research Matrix.pptx
RahulRoshan374K vistas
How to find information on the internet por Dalia El-Shafei
How to find information on the internetHow to find information on the internet
How to find information on the internet
Dalia El-Shafei277 vistas
Information storage and retrieval por Sadaf Rafiq
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
Sadaf Rafiq46.7K vistas
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ... por Amit Sheth
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth1.5K vistas
Biological databases por Afra Fathima
Biological databasesBiological databases
Biological databases
Afra Fathima1.3K vistas
Hinari basic course_module_2_workbook_2014_07 por Aslam Mehdi
Hinari basic course_module_2_workbook_2014_07Hinari basic course_module_2_workbook_2014_07
Hinari basic course_module_2_workbook_2014_07
Aslam Mehdi370 vistas
CS6007 information retrieval - 5 units notes por Anandh Arumugakan
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan7.6K vistas
Citation Database por Dr. Utpal Das
Citation Database Citation Database
Citation Database
Dr. Utpal Das1.4K vistas

Más de Hafiz Muhammad Zeeshan Raza

Car manufacturing is a complex and fascinating industry that plays a signific... por
Car manufacturing is a complex and fascinating industry that plays a signific...Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Hafiz Muhammad Zeeshan Raza
11 vistas1 diapositiva
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ... por
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Hafiz Muhammad Zeeshan Raza
166 vistas39 diapositivas
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA... por
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...Hafiz Muhammad Zeeshan Raza
201 vistas31 diapositivas
OMANTEL por
OMANTELOMANTEL
OMANTELHafiz Muhammad Zeeshan Raza
468 vistas11 diapositivas
Quality control of sequencing with fast qc obtained with por
Quality control of sequencing with fast qc obtained withQuality control of sequencing with fast qc obtained with
Quality control of sequencing with fast qc obtained withHafiz Muhammad Zeeshan Raza
743 vistas26 diapositivas
Cell organelles por
Cell organellesCell organelles
Cell organellesHafiz Muhammad Zeeshan Raza
1.8K vistas46 diapositivas

Más de Hafiz Muhammad Zeeshan Raza(13)

Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ... por Hafiz Muhammad Zeeshan Raza
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA... por Hafiz Muhammad Zeeshan Raza
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...

Último

AI Tools for Business and Startups por
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and StartupsSvetlin Nakov
111 vistas39 diapositivas
When Sex Gets Complicated: Porn, Affairs, & Cybersex por
When Sex Gets Complicated: Porn, Affairs, & CybersexWhen Sex Gets Complicated: Porn, Affairs, & Cybersex
When Sex Gets Complicated: Porn, Affairs, & CybersexMarlene Maheu
73 vistas73 diapositivas
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx por
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxDebapriya Chakraborty
684 vistas81 diapositivas
MIXING OF PHARMACEUTICALS.pptx por
MIXING OF PHARMACEUTICALS.pptxMIXING OF PHARMACEUTICALS.pptx
MIXING OF PHARMACEUTICALS.pptxAnupkumar Sharma
82 vistas35 diapositivas
The Accursed House by Émile Gaboriau por
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile GaboriauDivyaSheta
212 vistas15 diapositivas
On Killing a Tree.pptx por
On Killing a Tree.pptxOn Killing a Tree.pptx
On Killing a Tree.pptxAncyTEnglish
66 vistas11 diapositivas

Último(20)

AI Tools for Business and Startups por Svetlin Nakov
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and Startups
Svetlin Nakov111 vistas
When Sex Gets Complicated: Porn, Affairs, & Cybersex por Marlene Maheu
When Sex Gets Complicated: Porn, Affairs, & CybersexWhen Sex Gets Complicated: Porn, Affairs, & Cybersex
When Sex Gets Complicated: Porn, Affairs, & Cybersex
Marlene Maheu73 vistas
The Accursed House by Émile Gaboriau por DivyaSheta
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile Gaboriau
DivyaSheta212 vistas
On Killing a Tree.pptx por AncyTEnglish
On Killing a Tree.pptxOn Killing a Tree.pptx
On Killing a Tree.pptx
AncyTEnglish66 vistas
Psychology KS4 por WestHatch
Psychology KS4Psychology KS4
Psychology KS4
WestHatch90 vistas
Drama KS5 Breakdown por WestHatch
Drama KS5 BreakdownDrama KS5 Breakdown
Drama KS5 Breakdown
WestHatch87 vistas
Structure and Functions of Cell.pdf por Nithya Murugan
Structure and Functions of Cell.pdfStructure and Functions of Cell.pdf
Structure and Functions of Cell.pdf
Nithya Murugan701 vistas
Ch. 8 Political Party and Party System.pptx por Rommel Regala
Ch. 8 Political Party and Party System.pptxCh. 8 Political Party and Party System.pptx
Ch. 8 Political Party and Party System.pptx
Rommel Regala53 vistas
11.30.23 Poverty and Inequality in America.pptx por mary850239
11.30.23 Poverty and Inequality in America.pptx11.30.23 Poverty and Inequality in America.pptx
11.30.23 Poverty and Inequality in America.pptx
mary850239167 vistas
Ch. 7 Political Participation and Elections.pptx por Rommel Regala
Ch. 7 Political Participation and Elections.pptxCh. 7 Political Participation and Elections.pptx
Ch. 7 Political Participation and Elections.pptx
Rommel Regala105 vistas

Entrez databases

  • 3. Introduction • A major goal in developing databases is to provide efficient and user friendly access to the data stored. • There are a number of retrieval systems for biological data. • The most popular retrieval systems for biological databases are Entrez and Sequence Retrieval Systems (SRS) that provide access to multiple databases for retrieval of integrated search results.
  • 4. Continue… • To perform complex queries in a database often requires the use of Boolean operators. • This is to join a series of keywords using logical terms such as AND, OR, and NOT to indicate relationships between the keywords used in a search. • AND means that the search result must contain both words; OR means to search for results containing either word or both; NOT excludes results containing either one of the words.
  • 5. Continue… • In addition, one can use parentheses ( ) to define a concept if multiple words and relationships are involved, so that the computer knows which part of the search to execute first. • Items contained within parentheses are executed first. Quotes can be used to specify a phrase. Most search engines of public biological databases use some form of this Boolean logic.
  • 6. Entrez • The NCBI developed and maintains Entrez, a biological database retrieval system. • It is a gateway that allows text-based searches for a wide variety of data, including annotated genetic sequence information, structural information, as well as citations and abstracts, full papers, and taxonomic data. • The key feature of Entrez is its ability to integrate information, which comes from cross-referencing between NCBI databases based on preexisting and logical relationships between individual entries.
  • 7. Continue… • This is highly convenient: users do not have to visit multiple databases located in disparate places. • For example, in a nucleotide sequence page, one may find cross-referencing links to the translated protein sequence, genome mapping data, or to the related PubMed literature information, and to protein structures if available.
  • 8. Operation of Entrez • Effective use of Entrez requires an understanding of the main features of the search engine. • There are several options common to all NCBI databases that help to narrow the search. • One option is “Limits,” which helps to restrict the search to a subset of a particular database. • It can also be set to restrict a search to a particular database (e.g., the field for author or publication date) or a particular type of data (e.g., chloroplast DNA/RNA).
  • 9. Continue… • Another option is “Preview/Index,” which connects different searches with the Boolean operators and uses a string of logically connected keywords to perform a new search. • The search can also be limited to a particular search field (e.g., gene name or accession number). • The “History” option provides a record of the previous searches so that the user can review, revise, or combine the results of earlier searches. • There is also a “Clipboard” that stores search results for later viewing for a limited time. To store information in the Clipboard, the “Send to Clipboard” function should be used.
  • 10. Home Page of Entrez
  • 11. PubMed to Entrez • One of the databases accessible from Entrez is a biomedical literature database known as PubMed, which contains abstracts and in some cases the full text articles from nearly 4,000 journals. • An important feature of PubMed is the retrieval of information based on medical subject headings (MeSH) terms. • The MeSH system consists of a collection of more than 20,000 controlled and standardized vocabulary terms used for indexing articles. • In other words, it is a thesaurus that helps convert search keywords into standardized terms to describe a concept.
  • 12. Continue… • By doing so, it allows “smart” searches in which a group of accepted synonyms are employed so that the user not only gets exact matches, but also related matches on the same topic that otherwise might have been missed. • Another way to broaden the retrieval is by using the “Related Articles” option. PubMed uses a word weight algorithm to identify related articles with similar words in the titles, abstracts, and MeSH. • By using this feature, articles on the same topic that were missed in the original search can be retrieved.
  • 16. OMIM to Entrez • Another unique database accessible from Entrez is Online Mendelian Inheritance in Man (OMIM), which is a non-sequence-based database of human disease genes and human genetic disorders. • Each entry in OMIM contains summary information about a particular disease as well as genes related to the disease. • The text contains numerous hyperlinks to literature citations, primary sequence records, as well as chromosome loci of the disease genes. The database can serve as an excellent starting point to study genes related to a disease.
  • 18. Sequence retrieval system (SRS) • Sequence retrieval system is a retrieval system maintained by the EBI, which is comparable to NCBI Entrez. • It is not as integrated as Entrez, but allows the user to query multiple databases simultaneously, another good example of database integration. • It also offers direct access to certain sequence analysis applications such as sequence similarity searching and Clustal sequence alignment.
  • 19. Continue… • Queries can be launched using “Quick Text Search” with only one query box in which to enter information. • There are also more elaborate submission forms, the “Standard Query Form” and the “Extended Query Form.” • The standard form allows four criteria (fields) to be used, which are linked by Boolean operators. • The extended form allows many more diversified criteria and fields to be used. • The search results contain the query sequence and sequence annotation as well as links to literature, metabolic pathways, and other biological databases.
  • 20. Importance • Databases are fundamental to modern biological research, especially to genomic studies. The goal of a biological database is two fold: information retrieval and knowledge discovery. • Electronic databases can be constructed either as flat files, relational, or object oriented. Flat files are simple text files and lack any form of organization to facilitate information retrieval by computers. • Relational databases organize data as tables and search information among tables with shared features. • Object-oriented databases organize data as objects and associate the objects according to hierarchical relationships.
  • 21. Continue… • Biological databases need to be interconnected so that entries in one database can be cross-linked to related entries in another database. • NCBI databases accessible through Entrez are among the most integrated databases. Effective information retrieval involves the use of Boolean operators. • Entrez has additional user-friendly features to help conduct complex searches. One such option is to use Limits, Preview/Index, and History to narrow down the search space. • Alternatively, one can use NCBI-specific field qualifiers to conduct searches. To retrieve sequence information from NCBI GenBank, an understanding of the format of GenBank sequence files is necessary.
  • 22. Continue… • It is also important to bear in mind that sequence data in these databases are less than perfect. • There are sequence and annotation errors. Biological databases are also plagued by redundancy problems. • There are various solutions to correct annotation and reduce redundancy, for example, merging redundant sequences into a single entry or store highly redundant sequences into a separate database.