SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
NCBI API – Integration into
analysis code
QBRC Tech Talk
Jiwoong Kim
Outlines
• Introduction
• Usage Guidelines of the E-utilities
• Sample Applications of the E-utilities
NCBI & Entrez
• The National Center for
Biotechnology Information
advances science and health by
providing access to biomedical
and genomic information.
• Entrez is NCBI’s primary text
search and retrieval system
that integrates the PubMed
database of biomedical
literature with 39 other
literature and molecular
databases including DNA and
protein sequence, structure,
gene, genome, genetic
variation and gene expression.
E-utilities
• Entrez Programming Utilities
– The Entrez Programming Utilities (E-utilities) are a set of
eight server-side programs that provide a stable interface
into the Entrez query and database system at the NCBI.
– The E-utilities use a fixed URL syntax that translates a
standard set of input parameters into the values necessary
for various NCBI software components to search for and
retrieve the requested data.
E-utilitiesURL XML, FASTA, Text …
Input Output
Usage Guidelines and Requirements
• Use the E-utility URL
– baseURL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ …
– Python urllib/urlopen, Perl LWP::Simple, Linux wget, …
• Frequency, Timing and Registration of E-utility URL Requests
– Make no more than 3 requests per second → sleep(0.5)
– Run large jobs on weekends or between 5 PM and 9 AM EST
– Include &tool and &email in all requests
• Minimizing the Number of Requests
– &retmax=500
• Handling Special Characters Within URLs
– Space → +, " → %22, # → %23
ESearch
ESearch (text searches)
• Responds to a text query with the list of matching UIDs in a
given database (for later use in ESummary, EFetch or ELink),
along with the term translations of the query.
• Syntax: esearch.fcgi?db=<database>&term=<query>
– Input: Entrez database (&db); Any Entrez text query (&term)
– Output: List of UIDs matching the Entrez query
• Example: Get the PubMed IDs (PMIDs) for articles about
osteosarcoma
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&
term=%22osteosarcoma%22[majr:noexp]
ESummary
ESearch
UIDs
EFetch
UID
ESummary
(document summary downloads)
• Responds to a list of UIDs from a given database with the
corresponding document summaries.
• Syntax: esummary.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: XML DocSums
• Example: Download DocSums for these PubMed IDs:
24450072, 24333720, 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubme
d&id=24450072,24333720,24333432
EFetch
ELink
EFetch (data record downloads)
• Responds to a list of UIDs in a given database with the
corresponding data records in a specified format.
• Syntax:
efetch.fcgi?db=<database>&id=<uid_list>&rettype=<retrieval
_type>&retmode=<retrieval_mode>
– Input: List of UIDs (&id); Entrez database (&db); Retrieval type
(&rettype); Retrieval mode (&retmode)
– Output: Formatted data records as specified
• Example: Download the abstract of PubMed ID 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&i
d=24333432&rettype=abstract&retmode=text
ELink (Entrez links)
• Responds to a list of UIDs in a given database with either a list
of related UIDs (and relevancy scores) in the same database
or a list of linked UIDs in another Entrez database
• Checks for the existence of a specified link from a list of one
or more UIDs
• Creates a hyperlink to the primary LinkOut provider for a
specific UID and database, or lists LinkOut URLs and attributes
for multiple UIDs.
ELink (Entrez links)
• Syntax:
elink.fcgi?dbfrom=<source_db>&db=<destination_db>&id=<u
id_list>
– Input: List of UIDs (&id); Source Entrez database (&dbfrom);
Destination Entrez database (&db)
– Output: XML containing linked UIDs from source and destination
databases
• Example: Find one set/separate sets of Gene IDs linked to
PubMed IDs 24333432 and 24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432,24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432&id=24314238
EGQuery
EGQuery (global query)
• Responds to a text query with the number of records
matching the query in each Entrez database.
• Syntax: egquery.fcgi?term=<query>
– Input: Entrez text query (&term)
– Output: XML containing the number of hits in each database.
• Example: Determine the number of records for mouse in
Entrez.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=mouse[
orgn]&retmode=xml
ESpell
ESpell (spelling suggestions)
• Retrieves spelling suggestions for a text query in a given
database.
• Syntax: espell.fcgi?term=<query>&db=<database>
– Input: Entrez text query (&term); Entrez database (&db)
– Output: XML containing the original query and spelling suggestions.
• Example: Find spelling suggestions for the PubMed query
"osteosacoma".
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?term=osteosac
oma&db=pmc
EInfo (database statistics)
• Provides the number of records indexed in each field of a
given database, the date of the last update of the database,
and the available links from the database to other Entrez
databases.
• Syntax: einfo.fcgi?db=<database>
– Input: Entrez database (&db)
– Output: XML containing database statistics
• Example: Find database statistics for Entrez Protein.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein
EPost (UID uploads)
• Accepts a list of UIDs from a given database, stores the set on
the History Server, and responds with a query key and web
environment for the uploaded dataset.
• Syntax: epost.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: Web environment (&WebEnv) and query key (&query_key)
parameters specifying the location on the Entrez history server of the
list of uploaded UIDs
• Example: Upload five Gene IDs (7173, 22018, 54314, 403521,
525013) for later processing.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=gene&id=71
73,22018,54314,403521,525013
Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubme
d&term=%22osteosarcoma%22[majr:noexp]&usehistory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubm
ed&db=gene&query_key=1&WebEnv=NCID_1_220057266_130.14.
18.34_9001_1396281951_1196950266&term=%22homo+sapiens%
22[organism]&cmd=neighbor_history
3. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene
&query_key=3&WebEnv=NCID_1_220057266_130.14.18.34_9001_
1396281951_1196950266
Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz
• It can be used instead of "ELink".
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
• It can be used instead of "ESummary".
Application 2
• Find nucleotide sequences of "Burkholderia cepacia complex"
and download in GenBank format
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccor
e&term=%22burkholderia+cepacia+complex%22[organism]&usehist
ory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore
&query_key=1&WebEnv=NCID_1_264773253_130.14.22.215_9001
_1396244608_457974498&rettype=gb&retmode=text
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
"cancer copy number" articles
"Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
Make custom scripts with XML-parser
EBot
• EBot is an interactive web tool that first allows
users to construct an arbitrary E-utility
analysis pipeline and then generates a Perl
script to execute the pipeline. The Perl script
can be downloaded and executed on any
computer with a Perl installation. For more
details, see the EBot page linked above.
– http://www.ncbi.nlm.nih.gov/Class/PowerTools/e
utils/ebot/ebot.cgi
Entrez Direct
• E-utilities on the UNIX Command Line
• Download from ftp://ftp.ncbi.nih.gov/entrez/entrezdirect/
• Entrez Direct Functions
– esearch performs a new Entrez search using terms in indexed fields.
– elink looks up neighbors (within a database) or links (between databases).
– efilter filters or restricts the results of a previous query.
– efetch downloads records or reports in a designated format.
– xtract converts XML into a table of data values.
– einfo obtains information on indexed fields in an Entrez database.
– epost uploads unique identifiers (UIDs) or sequence accession numbers.
– nquire sends a URL request to a web page or CGI service.
• Entering Query Commands
– esearch -db pubmed -query "opsin gene conversion" | elink -related
Links
• References
– Entrez Programming Utilities Help
• http://www.ncbi.nlm.nih.gov/books/NBK25501/
– Entrez Help
• http://www.ncbi.nlm.nih.gov/books/NBK3836/
• Useful Links
– Entrez Unique Identifiers (UIDs) for selected databases
• http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?r
eport=objectonly
– Valid values of &retmode and &rettype for EFetch (null = empty string)
• http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?r
eport=objectonly
– The full list of Entrez links
• http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html
NCBI databases
• Literature: PubMed, PubMed Central, NLM Catalog, MeSH, Books, Site
Search
• Health: PubMed Health, MedGen, GTR, dbGaP, ClinVar, OMIM, OMIA
• Organisms: Taxonomy
• Nucleotide Sequences: Nucleotide, GSS, EST, SRA, PopSet, Probe
• Genomes: Genome, Assembly, Epigenomics, UniSTS, SNP, dbVar,
BioProject, BioSample, Clone
• Genes: Gene, HomoloGene, UniGene, GEO Profiles, GEO DataSets
• Proteins: Protein, Conserved Domains, Protein Clusters, Structure
• Chemicals: PubChem Compound, PubChem Substance, PubChem BioAssay
• Pathways: BioSystems
E-utilities
• Eight server-side programs
– ESearch : Searching a Database
– EPost : Uploading UIDs to Entrez
– ESummary : Downloading Document Summaries
– EFetch : Downloading Full Records
– ELink : Finding Related Data Through Entrez Links
– EInfo : Getting Database Statistics and Search Fields
– EGQuery : Performing a Global Entrez Search
– ESpell : Retrieving Spelling Suggestions
Sample Applications of the E-utilities
• Basic pipelines
– ESearch - ESummary/EFetch
– EPost - ESummary/EFetch
– ELink - ESummary/Efetch
– ESearch - ELink - ESummary/EFetch
– EPost - ELink - ESummary/EFetch
– EPost - ESearch
– ELink - ESearch
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
1. tr 'n' 't' < cancer_copy_number.pubmed_result.txt | sed 's/tt/n/g' | sed 's/^t[0-9]*: //' | sed 's/t/ /g' >
cancer_copy_number.pubmed_result.oneLine.txt
2. sed 's/^.* PubMed *PMID: *//' cancer_copy_number.pubmed_result.oneLine.txt | sed 's/; .*//' | sed 's/.$//' >
cancer_copy_number.pubmed_ids.txt
3. for id in $(cat cancer_copy_number.pubmed_ids.txt); do perl ~/scripts/elink.pl pubmed gds $id pubmed_gds | sed
"s/^/$idt/"; done > cancer_copy_number.pubmed_gds_ids.txt
4. awk -F't' '($1 == "Platform")' Affymetrix_Genome-Wide_Human_SNP_Array.gds_result.txt | cut -f2 | sed
's/^Accession: //' > Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt
5. for platform in $(cat Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt); do perl
~/scripts/esearch.pl gds $platform; done | sort -nu > Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt
6. paste cancer_copy_number.pubmed_ids.txt cancer_copy_number.pubmed_result.oneLine.txt | perl
~/scripts/table.addColumns.pl cancer_copy_number.pubmed_gds_ids.txt 0 - 0 1 | perl ~/scripts/table.search.pl
Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt 0 - 1 | perl ~/scripts/table.mergeLines.pl -d ', ' - 0,2 >
cancer_copy_number.Affymetrix_Genome-Wide_Human_SNP_Array.pubmed_gds.txt

Más contenido relacionado

La actualidad más candente

(修正)機械学習デザインパターン(ML Design Patterns)の解説
(修正)機械学習デザインパターン(ML Design Patterns)の解説(修正)機械学習デザインパターン(ML Design Patterns)の解説
(修正)機械学習デザインパターン(ML Design Patterns)の解説Hironori Washizaki
 
MLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMasakazu Shinoda
 
Ques12「AIのテスト~誤検知と検出漏れ~」
Ques12「AIのテスト~誤検知と検出漏れ~」Ques12「AIのテスト~誤検知と検出漏れ~」
Ques12「AIのテスト~誤検知と検出漏れ~」hirokazuoishi
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNNTakashi Abe
 
Normalization of microarray
Normalization of microarrayNormalization of microarray
Normalization of microarray弘毅 露崎
 
人工知能エージェントに「意識」を持たせるために必要な条件は何か
人工知能エージェントに「意識」を持たせるために必要な条件は何か人工知能エージェントに「意識」を持たせるために必要な条件は何か
人工知能エージェントに「意識」を持たせるために必要な条件は何かHirofumi_Yashima_
 
異常検知とGAN: Efficient GAN
異常検知とGAN: Efficient GAN異常検知とGAN: Efficient GAN
異常検知とGAN: Efficient GANKoichiro tamura
 
深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについて深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについてryosuke-kojima
 
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task LearningDeep Learning JP
 
[DL輪読会]It's not just size that maters small language models are also few sho...
[DL輪読会]It's not just size that maters  small language models are also few sho...[DL輪読会]It's not just size that maters  small language models are also few sho...
[DL輪読会]It's not just size that maters small language models are also few sho...Deep Learning JP
 
解説#82 記号論理学
解説#82 記号論理学解説#82 記号論理学
解説#82 記号論理学Ruo Ando
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
単語のベクトル表現/Word2Vecの基礎
単語のベクトル表現/Word2Vecの基礎単語のベクトル表現/Word2Vecの基礎
単語のベクトル表現/Word2Vecの基礎So
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIES
【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIES【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIES
【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIESDeep Learning JP
 
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定sugiuralab
 
Protein identification and analysis on ExPASy server
Protein identification and analysis on ExPASy serverProtein identification and analysis on ExPASy server
Protein identification and analysis on ExPASy serverEkta Gupta
 
第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生
第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生
第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生Matlantis
 
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"Deep Learning JP
 

La actualidad más candente (20)

(修正)機械学習デザインパターン(ML Design Patterns)の解説
(修正)機械学習デザインパターン(ML Design Patterns)の解説(修正)機械学習デザインパターン(ML Design Patterns)の解説
(修正)機械学習デザインパターン(ML Design Patterns)の解説
 
MLデザインパターン入門_Embeddings
MLデザインパターン入門_EmbeddingsMLデザインパターン入門_Embeddings
MLデザインパターン入門_Embeddings
 
Ques12「AIのテスト~誤検知と検出漏れ~」
Ques12「AIのテスト~誤検知と検出漏れ~」Ques12「AIのテスト~誤検知と検出漏れ~」
Ques12「AIのテスト~誤検知と検出漏れ~」
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN
 
Normalization of microarray
Normalization of microarrayNormalization of microarray
Normalization of microarray
 
人工知能エージェントに「意識」を持たせるために必要な条件は何か
人工知能エージェントに「意識」を持たせるために必要な条件は何か人工知能エージェントに「意識」を持たせるために必要な条件は何か
人工知能エージェントに「意識」を持たせるために必要な条件は何か
 
異常検知とGAN: Efficient GAN
異常検知とGAN: Efficient GAN異常検知とGAN: Efficient GAN
異常検知とGAN: Efficient GAN
 
深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについて深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについて
 
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
[DL輪読会]AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
 
[DL輪読会]It's not just size that maters small language models are also few sho...
[DL輪読会]It's not just size that maters  small language models are also few sho...[DL輪読会]It's not just size that maters  small language models are also few sho...
[DL輪読会]It's not just size that maters small language models are also few sho...
 
解説#82 記号論理学
解説#82 記号論理学解説#82 記号論理学
解説#82 記号論理学
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
単語のベクトル表現/Word2Vecの基礎
単語のベクトル表現/Word2Vecの基礎単語のベクトル表現/Word2Vecの基礎
単語のベクトル表現/Word2Vecの基礎
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIES
【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIES【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIES
【DL輪読会】GIT RE-BASIN: MERGING MODELS MODULO PERMU- TATION SYMMETRIES
 
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 
Protein identification and analysis on ExPASy server
Protein identification and analysis on ExPASy serverProtein identification and analysis on ExPASy server
Protein identification and analysis on ExPASy server
 
第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生
第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生
第3回Matlantis User Conference_20230721_北海道大学 鳥屋尾先生
 
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"
【DL輪読会】"Masked Siamese Networks for Label-Efficient Learning"
 

Similar a E-Utilities

A Short Guide to the E-utilities
A Short Guide to the E-utilitiesA Short Guide to the E-utilities
A Short Guide to the E-utilitiesDrake Huang
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalENCODE-DCC
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web servicesTrish Whetzel
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptxSilpa87
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015Pablo Pazos
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API ENCODE-DCC
 

Similar a E-Utilities (20)

A Short Guide to the E-utilities
A Short Guide to the E-utilitiesA Short Guide to the E-utilities
A Short Guide to the E-utilities
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 
Biothings presentation
Biothings presentationBiothings presentation
Biothings presentation
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Harvester I
Harvester IHarvester I
Harvester I
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API
 

Último

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 

Último (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

E-Utilities

  • 1. NCBI API – Integration into analysis code QBRC Tech Talk Jiwoong Kim
  • 2. Outlines • Introduction • Usage Guidelines of the E-utilities • Sample Applications of the E-utilities
  • 3. NCBI & Entrez • The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. • Entrez is NCBI’s primary text search and retrieval system that integrates the PubMed database of biomedical literature with 39 other literature and molecular databases including DNA and protein sequence, structure, gene, genome, genetic variation and gene expression.
  • 4. E-utilities • Entrez Programming Utilities – The Entrez Programming Utilities (E-utilities) are a set of eight server-side programs that provide a stable interface into the Entrez query and database system at the NCBI. – The E-utilities use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search for and retrieve the requested data. E-utilitiesURL XML, FASTA, Text … Input Output
  • 5. Usage Guidelines and Requirements • Use the E-utility URL – baseURL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ … – Python urllib/urlopen, Perl LWP::Simple, Linux wget, … • Frequency, Timing and Registration of E-utility URL Requests – Make no more than 3 requests per second → sleep(0.5) – Run large jobs on weekends or between 5 PM and 9 AM EST – Include &tool and &email in all requests • Minimizing the Number of Requests – &retmax=500 • Handling Special Characters Within URLs – Space → +, " → %22, # → %23
  • 7. ESearch (text searches) • Responds to a text query with the list of matching UIDs in a given database (for later use in ESummary, EFetch or ELink), along with the term translations of the query. • Syntax: esearch.fcgi?db=<database>&term=<query> – Input: Entrez database (&db); Any Entrez text query (&term) – Output: List of UIDs matching the Entrez query • Example: Get the PubMed IDs (PMIDs) for articles about osteosarcoma – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed& term=%22osteosarcoma%22[majr:noexp]
  • 9. ESummary (document summary downloads) • Responds to a list of UIDs from a given database with the corresponding document summaries. • Syntax: esummary.fcgi?db=<database>&id=<uid_list> – Input: List of UIDs (&id); Entrez database (&db) – Output: XML DocSums • Example: Download DocSums for these PubMed IDs: 24450072, 24333720, 24333432 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubme d&id=24450072,24333720,24333432
  • 11. EFetch (data record downloads) • Responds to a list of UIDs in a given database with the corresponding data records in a specified format. • Syntax: efetch.fcgi?db=<database>&id=<uid_list>&rettype=<retrieval _type>&retmode=<retrieval_mode> – Input: List of UIDs (&id); Entrez database (&db); Retrieval type (&rettype); Retrieval mode (&retmode) – Output: Formatted data records as specified • Example: Download the abstract of PubMed ID 24333432 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&i d=24333432&rettype=abstract&retmode=text
  • 12. ELink (Entrez links) • Responds to a list of UIDs in a given database with either a list of related UIDs (and relevancy scores) in the same database or a list of linked UIDs in another Entrez database • Checks for the existence of a specified link from a list of one or more UIDs • Creates a hyperlink to the primary LinkOut provider for a specific UID and database, or lists LinkOut URLs and attributes for multiple UIDs.
  • 13. ELink (Entrez links) • Syntax: elink.fcgi?dbfrom=<source_db>&db=<destination_db>&id=<u id_list> – Input: List of UIDs (&id); Source Entrez database (&dbfrom); Destination Entrez database (&db) – Output: XML containing linked UIDs from source and destination databases • Example: Find one set/separate sets of Gene IDs linked to PubMed IDs 24333432 and 24314238 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme d&db=gene&id=24333432,24314238 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme d&db=gene&id=24333432&id=24314238
  • 15. EGQuery (global query) • Responds to a text query with the number of records matching the query in each Entrez database. • Syntax: egquery.fcgi?term=<query> – Input: Entrez text query (&term) – Output: XML containing the number of hits in each database. • Example: Determine the number of records for mouse in Entrez. – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=mouse[ orgn]&retmode=xml
  • 17. ESpell (spelling suggestions) • Retrieves spelling suggestions for a text query in a given database. • Syntax: espell.fcgi?term=<query>&db=<database> – Input: Entrez text query (&term); Entrez database (&db) – Output: XML containing the original query and spelling suggestions. • Example: Find spelling suggestions for the PubMed query "osteosacoma". – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?term=osteosac oma&db=pmc
  • 18. EInfo (database statistics) • Provides the number of records indexed in each field of a given database, the date of the last update of the database, and the available links from the database to other Entrez databases. • Syntax: einfo.fcgi?db=<database> – Input: Entrez database (&db) – Output: XML containing database statistics • Example: Find database statistics for Entrez Protein. – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein
  • 19. EPost (UID uploads) • Accepts a list of UIDs from a given database, stores the set on the History Server, and responds with a query key and web environment for the uploaded dataset. • Syntax: epost.fcgi?db=<database>&id=<uid_list> – Input: List of UIDs (&id); Entrez database (&db) – Output: Web environment (&WebEnv) and query key (&query_key) parameters specifying the location on the Entrez history server of the list of uploaded UIDs • Example: Upload five Gene IDs (7173, 22018, 54314, 403521, 525013) for later processing. – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=gene&id=71 73,22018,54314,403521,525013
  • 20. Application 1 • Find related human genes to articles searched for non- extended MeSH term "Osteosarcoma" (PubMed → Gene) 1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubme d&term=%22osteosarcoma%22[majr:noexp]&usehistory=y 2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubm ed&db=gene&query_key=1&WebEnv=NCID_1_220057266_130.14. 18.34_9001_1396281951_1196950266&term=%22homo+sapiens% 22[organism]&cmd=neighbor_history 3. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene &query_key=3&WebEnv=NCID_1_220057266_130.14.18.34_9001_ 1396281951_1196950266
  • 21. Application 1 • Find related human genes to articles searched for non- extended MeSH term "Osteosarcoma" (PubMed → Gene) – ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz • It can be used instead of "ELink". – ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz • It can be used instead of "ESummary".
  • 22. Application 2 • Find nucleotide sequences of "Burkholderia cepacia complex" and download in GenBank format 1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccor e&term=%22burkholderia+cepacia+complex%22[organism]&usehist ory=y 2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore &query_key=1&WebEnv=NCID_1_264773253_130.14.22.215_9001 _1396244608_457974498&rettype=gb&retmode=text
  • 23. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets cancer "copy number" esearch.fcgi?db=pubmed Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter] esearch.fcgi?db=gds esummary.fcgi?db=pubmed WebEnv, query_key esummary.fcgi?db=gds WebEnv, query_key GPL9704 GPL8226 GPL6804 GPL6801 elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds Parsing Result table Common PubMed title
  • 24. "cancer copy number" articles "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets
  • 25. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets cancer "copy number" esearch.fcgi?db=pubmed Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter] esearch.fcgi?db=gds esummary.fcgi?db=pubmed WebEnv, query_key esummary.fcgi?db=gds WebEnv, query_key GPL9704 GPL8226 GPL6804 GPL6801 elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds Parsing Result table Common PubMed title
  • 26. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets
  • 27. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets cancer "copy number" esearch.fcgi?db=pubmed Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter] esearch.fcgi?db=gds esummary.fcgi?db=pubmed WebEnv, query_key esummary.fcgi?db=gds WebEnv, query_key GPL9704 GPL8226 GPL6804 GPL6801 elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds Parsing Result table Common PubMed title
  • 28. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets
  • 29. Make custom scripts with XML-parser
  • 30. EBot • EBot is an interactive web tool that first allows users to construct an arbitrary E-utility analysis pipeline and then generates a Perl script to execute the pipeline. The Perl script can be downloaded and executed on any computer with a Perl installation. For more details, see the EBot page linked above. – http://www.ncbi.nlm.nih.gov/Class/PowerTools/e utils/ebot/ebot.cgi
  • 31. Entrez Direct • E-utilities on the UNIX Command Line • Download from ftp://ftp.ncbi.nih.gov/entrez/entrezdirect/ • Entrez Direct Functions – esearch performs a new Entrez search using terms in indexed fields. – elink looks up neighbors (within a database) or links (between databases). – efilter filters or restricts the results of a previous query. – efetch downloads records or reports in a designated format. – xtract converts XML into a table of data values. – einfo obtains information on indexed fields in an Entrez database. – epost uploads unique identifiers (UIDs) or sequence accession numbers. – nquire sends a URL request to a web page or CGI service. • Entering Query Commands – esearch -db pubmed -query "opsin gene conversion" | elink -related
  • 32. Links • References – Entrez Programming Utilities Help • http://www.ncbi.nlm.nih.gov/books/NBK25501/ – Entrez Help • http://www.ncbi.nlm.nih.gov/books/NBK3836/ • Useful Links – Entrez Unique Identifiers (UIDs) for selected databases • http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?r eport=objectonly – Valid values of &retmode and &rettype for EFetch (null = empty string) • http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?r eport=objectonly – The full list of Entrez links • http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html
  • 33. NCBI databases • Literature: PubMed, PubMed Central, NLM Catalog, MeSH, Books, Site Search • Health: PubMed Health, MedGen, GTR, dbGaP, ClinVar, OMIM, OMIA • Organisms: Taxonomy • Nucleotide Sequences: Nucleotide, GSS, EST, SRA, PopSet, Probe • Genomes: Genome, Assembly, Epigenomics, UniSTS, SNP, dbVar, BioProject, BioSample, Clone • Genes: Gene, HomoloGene, UniGene, GEO Profiles, GEO DataSets • Proteins: Protein, Conserved Domains, Protein Clusters, Structure • Chemicals: PubChem Compound, PubChem Substance, PubChem BioAssay • Pathways: BioSystems
  • 34. E-utilities • Eight server-side programs – ESearch : Searching a Database – EPost : Uploading UIDs to Entrez – ESummary : Downloading Document Summaries – EFetch : Downloading Full Records – ELink : Finding Related Data Through Entrez Links – EInfo : Getting Database Statistics and Search Fields – EGQuery : Performing a Global Entrez Search – ESpell : Retrieving Spelling Suggestions
  • 35. Sample Applications of the E-utilities • Basic pipelines – ESearch - ESummary/EFetch – EPost - ESummary/EFetch – ELink - ESummary/Efetch – ESearch - ELink - ESummary/EFetch – EPost - ELink - ESummary/EFetch – EPost - ESearch – ELink - ESearch
  • 36. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets 1. tr 'n' 't' < cancer_copy_number.pubmed_result.txt | sed 's/tt/n/g' | sed 's/^t[0-9]*: //' | sed 's/t/ /g' > cancer_copy_number.pubmed_result.oneLine.txt 2. sed 's/^.* PubMed *PMID: *//' cancer_copy_number.pubmed_result.oneLine.txt | sed 's/; .*//' | sed 's/.$//' > cancer_copy_number.pubmed_ids.txt 3. for id in $(cat cancer_copy_number.pubmed_ids.txt); do perl ~/scripts/elink.pl pubmed gds $id pubmed_gds | sed "s/^/$idt/"; done > cancer_copy_number.pubmed_gds_ids.txt 4. awk -F't' '($1 == "Platform")' Affymetrix_Genome-Wide_Human_SNP_Array.gds_result.txt | cut -f2 | sed 's/^Accession: //' > Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt 5. for platform in $(cat Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt); do perl ~/scripts/esearch.pl gds $platform; done | sort -nu > Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt 6. paste cancer_copy_number.pubmed_ids.txt cancer_copy_number.pubmed_result.oneLine.txt | perl ~/scripts/table.addColumns.pl cancer_copy_number.pubmed_gds_ids.txt 0 - 0 1 | perl ~/scripts/table.search.pl Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt 0 - 1 | perl ~/scripts/table.mergeLines.pl -d ', ' - 0,2 > cancer_copy_number.Affymetrix_Genome-Wide_Human_SNP_Array.pubmed_gds.txt