SlideShare una empresa de Scribd logo
1 de 36
The Evolution of the Resources
  Navigating Genome Reference
        Human Genome
             at NCBI
                        Part 1
                Deanna M. Church, NCBI




@deannachurch
NCBI

BLAST   PubMed   GenBank
ClinVar
                        140,000                                                                                                                            2,500,000
                                                                                                                                GTR
                                         Twenty Two Years of Growth:                                                            Genome Remapping Service
                                                                                                                                PubMed Health
                                                                                                                                CloneDB
                        120,000
                                         NCBI Data and User Services                                          Public Access
                                                                                                                                Genome Decoration Page
                                                                                                              Influenza Seqs.
                                                       GenBank Base Pairs                                     GenSAT                                       2,000,000
                                                       Users (Average)                                        GeneTests
                                                                                                     PubChem                            Peptidome
                        100,000                                                                      Trace Archive                      BioSystems
                                                                                                     CCDS                               Flu H1N1
                                                                                                     Cancer Chromosomes
                                                                                                     Environmental Samples
                                                                                                                               Discovery Initiative         1,500,000
Base Pairs (Millions)




                         80,000                                                       PubMed Central Entrez Genes              Entrez Sensors




                                                                                                                                                                        Users/Weekday
                                                                                      BLINK              Mouse Composite       Primer BLAST
                                                                                      MapViewer           Genome
                                                                                      GEO                Gnomon         Seq Read Archive
                                                                                      GeneRIFs                          UniSTS
                                                                                                   WGS
                                                                                                                        RefSeqGene
                         60,000                                                                    HLA Haplotypes
                                                                                  Human Genome Human Genome-TPA Genome Reference
                                                                                  LinkOut                                 Consortium                        1,000,000
                                                                                             dbMHC                                             dbVar
                                                                       PubMed LocusLink                                                        Epigenomics
                                                                                             BookShelf
                                                                       PSI-BLAST RefSeq                                                        MyNCBI
                                                           BankIt                            Human Genome-
                                                                       VAST       dbSNP                                                        1000 Genomes
                         40,000                            Genomes                            Transcripts Alignments
                                                                       ePCR                                                                    Project
                                                           Taxonomy         Microbial Genomes                          Genome-Wide
                                                                            PHI-BLAST                                    Association Studies
                                              3D Structure        OMIM      CGAP                                       dbGap                                500,000
                                              Network Entrez      GeneMap                                              Entrez Portal
                         20,000                                   Cn3D
                                                        WWW
                                             GenBank              UniGene
                                                        dbSTS
                                       Entrez at NCBI
                                  BLAST      dbEST

                             0                                                                                                                             0
                              1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
NCBI

Tools         Literature       Data
   Blast         PubMed         GenBank
 GBench       PubMed Central   Protein DB
  Splign        Bookshelf         SRA
  Cn3D            MeSH            GEO
  e-PCR        Gene Reviews      dbSNP
e-Utilities         …             Gene
    …                            RefSeq
                                   …
Entrez: Pathway to Discovery



                              Term frequency
                                 statistics


                                   MEDLINE
                                   abstracts
                Literature                        Literature citations
               citations in                           in sequence
                sequence                               databases
               databases



                      Nucleotide                 Protein
                      sequences                sequences
    Nucleotide                                               Amino acid sequence
sequence similarity            Coding region                      similarity
                                 features
Programmatic access
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+
AND+breast+cancer+AND+2008[pdat]&usehistory=y
       <eSearchResult>
            <Count>6</Count>
            <RetMax>6</RetMax>
            <RetStart>0</RetStart>
            <IdList>
                  <Id>19008416</Id>
                  <Id>18927361</Id>
                  <Id>18787170</Id>
                  <Id>18487186</Id>
                  <Id>18239126</Id>
                  <Id>18239125</Id>
            </IdList>
       …
http://www.ncbi.nlm.nih.gov/books/NBK25501/
http://www.youtube.com/NCBINLM   @NCBI   http://www.facebook.com/ncbi.nlm

http://www.ncbi.nlm.nih.gov/education/
Collins FS et al, 1998




   Throughput: 500 Mb/year
     Cost: < $0.25 per base
Variation: 100,000 SNPs mapped
Steve Sherry, NCBI

                                                   60
                                                         Millions
NCBI dbSNP database growth                               of rs-ids
human variations                                   50


                                                   40


                                                   30


                                                   20

Non-redundant                                              STR & Indel
                                                   10
                                                           SNP
annotations
                                                           Ambiguous mapping

 1999    2000                    2005      2011
                                            2010




                                                         Millions
Submissions                                              of submissions
                                                   25
by project
                                                   50

                                                   75

                                                   100
                                                           1000 Genomes
                                                   125     Other projects
                                                           HapMap
                                                   150     TSC
dbSNP build 135. November 2011
                                                   175
Kidd et al, 2007 APOBEC cluster




BLACK: Deletion
White: Insertion
http://www.ncbi.nlm.nih.gov/dbvar
Church et al., 2011 PLoS




http://genomereference.org
GRC Beginnings


       Distributed data

    Old Assembly Model

Genome not in INSDC Database
Build sequence contigs based on contigs
defined in TPF.
 Check for orientation consistencies
 Select switch points
 Instantiate sequence for further analysis


                 Switch point




                      Consensus sequence
ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/
Community Input
Distributed data
      Centralized Data

    Old Assembly Model

Genome not in INSDC Database
Large-Scale Variation Complicates Genome Assembly

         Sequences from haplotype 1
         Sequences from haplotype 2




Old Assembly model: compress into a consensus



New Assembly model: represent both haplotypes
UGT2B17 Region




NCBI36 (hg18)
UGT2B17 Region
NCBI36 NC_000004.10 (chr4) Tiling Path
                AC079749.5         AC147055.2                                            AC019173.4                AC021146.7
  AC074378.4                 AC134921.2                               AC140484.1                      AC093720.2




                              TMPRSS11E                                                         TMPRSS11E2


GRCh37 NC_000004.11 (chr4) Tiling Path
                              AC079749.5                 AC147055.2                                                AC021146.7
  AC074378.4                                    AC134921.1                         AC093720.2


                                    TMPRSS11E


GRCh37: NT_167250.1 (UGT2B17 alternate locus)
                                                   AC019173.4                                                      AC021146.7
   AC074378.4                                                                                    AC226496.2
                AC140484.1

                                     TMPRSS11E2



Xue Y et al, 2008
UGT2B17   MHC                  MAPT   GRCh37 (hg19)




                             7 alternate haplotypes
                                        at the MHC

                               Alternate loci released as:
                                                    FASTA
                                                      AGP
                              Alignment to chromosome


http://genomereference.org
Assembly (e.g. GRCh37)
PAR                Non-nuclear
       Primary    assembly unit
       Assembly      (e.g. MT)

                   ALT       ALT   ALT
       Genomic      1         2     3
        Region
         (MHC)
       Genomic
                   ALT       ALT   ALT
        Region      4         5     6
      (UGT2B17)
       Genomic
        Region
                                   ALT
                   ALT
        (MAPT)                      7
                    8

                   ALT
                    9
Richa Agarwala




MHC Alternate locus
  Alignment to chr6
Oh No! Not a new
                             version of the human
                             genome!




http://genomereference.org
Assembly (e.g. GRCh37.p5)
PAR                Non-nuclear
       Primary    assembly unit
       Assembly      (e.g. MT)

                   ALT       ALT   ALT
       Genomic      1         2     3
         Region
         (MHC)
       Genomic
                   ALT       ALT   ALT
         Region     4         5     6
      (UGT2B17)
       Genomic
         Region
                                   ALT
                   ALT
        (MAPT)                      7
       Genomic      8
         Region
         (ABO)
       Genomic     ALT
         Region     9
         (SMA)
       Genomic
         Region
       (PECAM1)
                  Patches
         …
TBC1D3C         TBC1D3   TBC1D3H




                TBC1D3C




Myo19 region (17q21)
60 Fix PATCHES: Chromosome will update in GRCh38
  (adds >1 Mb of novel sequence to the assembly)

70 Novel PATCHES: Additional sequence added
  (adds >800K of novel sequence to the assembly)

                                                   Releasing patches quarterly
Distributed data
      Centralized Data
    Old Assembly Model
   Updated Assembly Model
Genome not in INSDC Database
  Genome in INSDC Database

Más contenido relacionado

Destacado

Destacado (8)

Transitioning to gr_ch38
Transitioning to gr_ch38Transitioning to gr_ch38
Transitioning to gr_ch38
 
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
 
Using My NCBI & My Bibliography
Using My NCBI & My BibliographyUsing My NCBI & My Bibliography
Using My NCBI & My Bibliography
 
EVE161 Lecture 2
EVE161 Lecture 2EVE161 Lecture 2
EVE161 Lecture 2
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
Use of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay DesignUse of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay Design
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 

Similar a Church gmod2012 pt1

The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...Borlaug Global Rust Initiative
 
Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24Sage Base
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Sage Base
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectFundación Ramón Areces
 
Biocuration2012 Eugeni Belda
Biocuration2012 Eugeni BeldaBiocuration2012 Eugeni Belda
Biocuration2012 Eugeni Beldaeugenibc
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsReece Hart
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009bosc
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...Larry Smarr
 
Friend Oslo 2012-09-09
Friend Oslo 2012-09-09Friend Oslo 2012-09-09
Friend Oslo 2012-09-09Sage Base
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012gregcaporaso
 
Scratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics LandscapeScratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics LandscapeVince Smith
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionThermo Fisher Scientific
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Sage Base
 
Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Sage Base
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningReece Hart
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;applicationFyzah Bashir
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema
 

Similar a Church gmod2012 pt1 (20)

The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24Stephen Friend Nature Genetics Colloquium 2012-03-24
Stephen Friend Nature Genetics Colloquium 2012-03-24
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
Biocuration2012 Eugeni Belda
Biocuration2012 Eugeni BeldaBiocuration2012 Eugeni Belda
Biocuration2012 Eugeni Belda
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome Commons
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Friend Oslo 2012-09-09
Friend Oslo 2012-09-09Friend Oslo 2012-09-09
Friend Oslo 2012-09-09
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
 
Scratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics LandscapeScratchpads in the Biodiversity Informatics Landscape
Scratchpads in the Biodiversity Informatics Landscape
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein production
 
NCBI
NCBINCBI
NCBI
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
 
Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource Center
 

Más de Deanna Church

Más de Deanna Church (16)

Church SFAF2014 keynote
Church SFAF2014 keynoteChurch SFAF2014 keynote
Church SFAF2014 keynote
 
Church_NCBIvariation2013
Church_NCBIvariation2013Church_NCBIvariation2013
Church_NCBIvariation2013
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Church iowa2013
Church iowa2013Church iowa2013
Church iowa2013
 
Church emory2013
Church emory2013Church emory2013
Church emory2013
 
Church GeT-RM
Church GeT-RMChurch GeT-RM
Church GeT-RM
 
Church sfaf13
Church sfaf13Church sfaf13
Church sfaf13
 
Church gia13
Church gia13Church gia13
Church gia13
 
Church apr2013
Church apr2013Church apr2013
Church apr2013
 
Church ngs
Church ngsChurch ngs
Church ngs
 
Church agbt13 merge
Church agbt13 mergeChurch agbt13 merge
Church agbt13 merge
 
Church clinical2012
Church clinical2012Church clinical2012
Church clinical2012
 
Church isca2012
Church isca2012Church isca2012
Church isca2012
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
 
Church Fif2009
Church Fif2009Church Fif2009
Church Fif2009
 

Último

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Último (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Church gmod2012 pt1

  • 1. The Evolution of the Resources Navigating Genome Reference Human Genome at NCBI Part 1 Deanna M. Church, NCBI @deannachurch
  • 2. NCBI BLAST PubMed GenBank
  • 3. ClinVar 140,000 2,500,000 GTR Twenty Two Years of Growth: Genome Remapping Service PubMed Health CloneDB 120,000 NCBI Data and User Services Public Access Genome Decoration Page Influenza Seqs. GenBank Base Pairs GenSAT 2,000,000 Users (Average) GeneTests PubChem Peptidome 100,000 Trace Archive BioSystems CCDS Flu H1N1 Cancer Chromosomes Environmental Samples Discovery Initiative 1,500,000 Base Pairs (Millions) 80,000 PubMed Central Entrez Genes Entrez Sensors Users/Weekday BLINK Mouse Composite Primer BLAST MapViewer Genome GEO Gnomon Seq Read Archive GeneRIFs UniSTS WGS RefSeqGene 60,000 HLA Haplotypes Human Genome Human Genome-TPA Genome Reference LinkOut Consortium 1,000,000 dbMHC dbVar PubMed LocusLink Epigenomics BookShelf PSI-BLAST RefSeq MyNCBI BankIt Human Genome- VAST dbSNP 1000 Genomes 40,000 Genomes Transcripts Alignments ePCR Project Taxonomy Microbial Genomes Genome-Wide PHI-BLAST Association Studies 3D Structure OMIM CGAP dbGap 500,000 Network Entrez GeneMap Entrez Portal 20,000 Cn3D WWW GenBank UniGene dbSTS Entrez at NCBI BLAST dbEST 0 0 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
  • 4. NCBI Tools Literature Data Blast PubMed GenBank GBench PubMed Central Protein DB Splign Bookshelf SRA Cn3D MeSH GEO e-PCR Gene Reviews dbSNP e-Utilities … Gene … RefSeq …
  • 5. Entrez: Pathway to Discovery Term frequency statistics MEDLINE abstracts Literature Literature citations citations in in sequence sequence databases databases Nucleotide Protein sequences sequences Nucleotide Amino acid sequence sequence similarity Coding region similarity features
  • 6. Programmatic access http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+ AND+breast+cancer+AND+2008[pdat]&usehistory=y <eSearchResult> <Count>6</Count> <RetMax>6</RetMax> <RetStart>0</RetStart> <IdList> <Id>19008416</Id> <Id>18927361</Id> <Id>18787170</Id> <Id>18487186</Id> <Id>18239126</Id> <Id>18239125</Id> </IdList> … http://www.ncbi.nlm.nih.gov/books/NBK25501/
  • 7. http://www.youtube.com/NCBINLM @NCBI http://www.facebook.com/ncbi.nlm http://www.ncbi.nlm.nih.gov/education/
  • 8. Collins FS et al, 1998 Throughput: 500 Mb/year Cost: < $0.25 per base Variation: 100,000 SNPs mapped
  • 9. Steve Sherry, NCBI 60 Millions NCBI dbSNP database growth of rs-ids human variations 50 40 30 20 Non-redundant STR & Indel 10 SNP annotations Ambiguous mapping 1999 2000 2005 2011 2010 Millions Submissions of submissions 25 by project 50 75 100 1000 Genomes 125 Other projects HapMap 150 TSC dbSNP build 135. November 2011 175
  • 10. Kidd et al, 2007 APOBEC cluster BLACK: Deletion White: Insertion
  • 12.
  • 13. Church et al., 2011 PLoS http://genomereference.org
  • 14. GRC Beginnings Distributed data Old Assembly Model Genome not in INSDC Database
  • 15.
  • 16.
  • 17.
  • 18. Build sequence contigs based on contigs defined in TPF. Check for orientation consistencies Select switch points Instantiate sequence for further analysis Switch point Consensus sequence
  • 19.
  • 21.
  • 23. Distributed data Centralized Data Old Assembly Model Genome not in INSDC Database
  • 24. Large-Scale Variation Complicates Genome Assembly Sequences from haplotype 1 Sequences from haplotype 2 Old Assembly model: compress into a consensus New Assembly model: represent both haplotypes
  • 26. UGT2B17 Region NCBI36 NC_000004.10 (chr4) Tiling Path AC079749.5 AC147055.2 AC019173.4 AC021146.7 AC074378.4 AC134921.2 AC140484.1 AC093720.2 TMPRSS11E TMPRSS11E2 GRCh37 NC_000004.11 (chr4) Tiling Path AC079749.5 AC147055.2 AC021146.7 AC074378.4 AC134921.1 AC093720.2 TMPRSS11E GRCh37: NT_167250.1 (UGT2B17 alternate locus) AC019173.4 AC021146.7 AC074378.4 AC226496.2 AC140484.1 TMPRSS11E2 Xue Y et al, 2008
  • 27. UGT2B17 MHC MAPT GRCh37 (hg19) 7 alternate haplotypes at the MHC Alternate loci released as: FASTA AGP Alignment to chromosome http://genomereference.org
  • 28.
  • 29. Assembly (e.g. GRCh37) PAR Non-nuclear Primary assembly unit Assembly (e.g. MT) ALT ALT ALT Genomic 1 2 3 Region (MHC) Genomic ALT ALT ALT Region 4 5 6 (UGT2B17) Genomic Region ALT ALT (MAPT) 7 8 ALT 9
  • 30. Richa Agarwala MHC Alternate locus Alignment to chr6
  • 31. Oh No! Not a new version of the human genome! http://genomereference.org
  • 32.
  • 33. Assembly (e.g. GRCh37.p5) PAR Non-nuclear Primary assembly unit Assembly (e.g. MT) ALT ALT ALT Genomic 1 2 3 Region (MHC) Genomic ALT ALT ALT Region 4 5 6 (UGT2B17) Genomic Region ALT ALT (MAPT) 7 Genomic 8 Region (ABO) Genomic ALT Region 9 (SMA) Genomic Region (PECAM1) Patches …
  • 34. TBC1D3C TBC1D3 TBC1D3H TBC1D3C Myo19 region (17q21)
  • 35. 60 Fix PATCHES: Chromosome will update in GRCh38 (adds >1 Mb of novel sequence to the assembly) 70 Novel PATCHES: Additional sequence added (adds >800K of novel sequence to the assembly) Releasing patches quarterly
  • 36. Distributed data Centralized Data Old Assembly Model Updated Assembly Model Genome not in INSDC Database Genome in INSDC Database

Notas del editor

  1. TPFs are loaded to a centralized system for tracking. This system also manages QA on the files as an ongoing process. The first level of QA is to look at the overlap between adjacent sequences on the TPF.
  2. When certifying an overlap, external evidence supporting the alignment must be available. Evidence typically consists of sequence data from another source, spanning clone ends or experimental verification (such as a PCR assay detecting the join).These certificates are reviewed by other GRC members and may be approved or rejected. Certification information is publicly available.
  3. Alignments refer to pairs of sequence. Once you know how a pair of sequences go together, you can look at stringing the pairs along into a contig. The contig is essentially the consensus sequence that is produced from the components.To create a contig, we use the steps shown on this slide.What are switch points? As you create the consensus sequence of the contig, the switch points tell you where to stop using the sequence from one component and begin using the sequence from the next.