SlideShare una empresa de Scribd logo
1 de 17
Mining Single Nucleotide Polymorphisms from public sequence databases. Gary Barker  IACR Long Ashton
What are Single Nucleotide Polymorphisms (SNPs)? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Why are these polymorphisms useful? It’s sometimes possible to correlate a  SNP with a particular trait. This is known as  association genetics.
Disease resistant population Disease susceptible population Genotype all individuals for thousands of SNPs ATG A TTATAG ATG T TTATAG Resistant people all have an ‘A’ at position 4 in  geneX ,  while susceptible people have a ‘T’ gene X
To use SNPs, you first have to find them. Poorly studied organisms:  Sequence many ‘loci’ (different places in the genome)  for many individuals.  Many well studied organisms :  Required data is already present in public sequence databases,  it just needs to be processed.
Number of ESTs* in EMBL database *ESTs are single pass (often partial) gene sequences
Mining SNPs from EST sequences in the database AutoSNP  (PERL script) can find likely SNPs in data sets downloaded from public databases. 1) Marks up only those polymorphisms where each allele is supported by at least two independent sequences. This filters out most sequencing errors. 2) Adds further confidence scores based on co-segregation 3) Results written to HTML reports.
 
 
 
Accessing AutoSNP results 1) Search by accession number:
 
Accessing AutoSNP results 2) Search with a query sequence
 
Current AutoSNP approach:   Cluster sequences (d2cluster) Align and find SNPs (cap3) Accession # / SNP report # Query with Accession MySQL database gi|11117503  |  snip_1.htm gi|12217138  |  snip_2.htm Sequence query Blast client Matching  Accessions Links to existing SNP reports
Desirable: Client supplied query Sequence (ATAGCGTACG……) Blast search (data direct from EBI?) Build contigs of results Detect eSNPs Client gets SNP report(s) (html) for all sequences matching query Data and processing power  (large) processing power (medium)  processing power (small) < 10 seconds
Conclusions SNPs (single nucleotide polymorphisms) are abundant and useful genetic markers. Software exists to mine them from public data sets, but this doesn’t work in real time. GRID technology could help to deliver up-to-date alignments to users for any query sequence with putative SNPs marked up. Related useful features would include bootstrapped trees for each alignment, generated on the fly.

Más contenido relacionado

La actualidad más candente

Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
Karan Veer Singh
 

La actualidad más candente (20)

Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Microsatellites Markers
Microsatellites  MarkersMicrosatellites  Markers
Microsatellites Markers
 
Est database
Est databaseEst database
Est database
 
Probe labeling
Probe labelingProbe labeling
Probe labeling
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 
Marker free transgenic strategy
Marker free transgenic strategyMarker free transgenic strategy
Marker free transgenic strategy
 
MASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCINGMASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCING
 
MICROARRAY
MICROARRAYMICROARRAY
MICROARRAY
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 
Cloning
Cloning Cloning
Cloning
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
 
Snp
SnpSnp
Snp
 
Map based cloning of genome
Map based cloning of genomeMap based cloning of genome
Map based cloning of genome
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
 
Ppt snp detection
Ppt snp detectionPpt snp detection
Ppt snp detection
 
SNP Genotyping Technologies
SNP Genotyping TechnologiesSNP Genotyping Technologies
SNP Genotyping Technologies
 
Ssr assignment
Ssr assignmentSsr assignment
Ssr assignment
 

Similar a SNP

An analogy of algorithms for tagging of single nucleotide polymorphism and ev
An analogy of algorithms for tagging of single nucleotide polymorphism and evAn analogy of algorithms for tagging of single nucleotide polymorphism and ev
An analogy of algorithms for tagging of single nucleotide polymorphism and ev
IAEME Publication
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206
GenomeInABottle
 

Similar a SNP (20)

NCBI
NCBINCBI
NCBI
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
An analogy of algorithms for tagging of single nucleotide polymorphism and ev
An analogy of algorithms for tagging of single nucleotide polymorphism and evAn analogy of algorithms for tagging of single nucleotide polymorphism and ev
An analogy of algorithms for tagging of single nucleotide polymorphism and ev
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
31931 31941
31931 3194131931 31941
31931 31941
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
A Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNGA Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNG
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
 
Improved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay DesignsImproved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay Designs
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
DNA Profiling_HMD_2020.pptx
DNA Profiling_HMD_2020.pptxDNA Profiling_HMD_2020.pptx
DNA Profiling_HMD_2020.pptx
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

SNP

  • 1. Mining Single Nucleotide Polymorphisms from public sequence databases. Gary Barker IACR Long Ashton
  • 2.
  • 3. Why are these polymorphisms useful? It’s sometimes possible to correlate a SNP with a particular trait. This is known as association genetics.
  • 4. Disease resistant population Disease susceptible population Genotype all individuals for thousands of SNPs ATG A TTATAG ATG T TTATAG Resistant people all have an ‘A’ at position 4 in geneX , while susceptible people have a ‘T’ gene X
  • 5. To use SNPs, you first have to find them. Poorly studied organisms: Sequence many ‘loci’ (different places in the genome) for many individuals. Many well studied organisms : Required data is already present in public sequence databases, it just needs to be processed.
  • 6. Number of ESTs* in EMBL database *ESTs are single pass (often partial) gene sequences
  • 7. Mining SNPs from EST sequences in the database AutoSNP (PERL script) can find likely SNPs in data sets downloaded from public databases. 1) Marks up only those polymorphisms where each allele is supported by at least two independent sequences. This filters out most sequencing errors. 2) Adds further confidence scores based on co-segregation 3) Results written to HTML reports.
  • 8.  
  • 9.  
  • 10.  
  • 11. Accessing AutoSNP results 1) Search by accession number:
  • 12.  
  • 13. Accessing AutoSNP results 2) Search with a query sequence
  • 14.  
  • 15. Current AutoSNP approach: Cluster sequences (d2cluster) Align and find SNPs (cap3) Accession # / SNP report # Query with Accession MySQL database gi|11117503 | snip_1.htm gi|12217138 | snip_2.htm Sequence query Blast client Matching Accessions Links to existing SNP reports
  • 16. Desirable: Client supplied query Sequence (ATAGCGTACG……) Blast search (data direct from EBI?) Build contigs of results Detect eSNPs Client gets SNP report(s) (html) for all sequences matching query Data and processing power (large) processing power (medium) processing power (small) < 10 seconds
  • 17. Conclusions SNPs (single nucleotide polymorphisms) are abundant and useful genetic markers. Software exists to mine them from public data sets, but this doesn’t work in real time. GRID technology could help to deliver up-to-date alignments to users for any query sequence with putative SNPs marked up. Related useful features would include bootstrapped trees for each alignment, generated on the fly.