SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Genometry
Gregg Helt
Cyrus Harmon
Genometry
•  Motivation and Purpose
•  Points of Reference
•  Genometry interfaces
•  Genometry manipulations
•  Genometry implementation
•  Representation examples
•  Prototype apps
•  Current status, future work
Motivation and Goals
•  Desire for a more unified data model to represent
relationships between biological sequences, such as:
–  Annotations
–  Alignments
–  Sequence composition
•  More networked, less hierarchical (genome-centric,
transcript-centric)
•  Simplicity
•  Expressivity / Flexibility
•  Memory and Computational Efficiency
•  Use by others to provide core functionality for various
Affy projects
Points of Reference
•  com.neomorphic.bio models
•  Genisys DB and Genisys IDL
•  EBI mapping models
•  Apollo data models
•  BioPerl
•  BioJava
•  Closest similarity to bio alignment models and
Genisys alignment models
Basic Annotations
Transcript T
Genome G
Transcript T
G: 1000..5000
Exon E1
G:1000..1200
Exon E2
G:3000..3500
Exon E3
G:4500..5000
Genometry Annotations – Specify All Coordinates
Transcript T
Genome G
Transcript T
G: 1000..5000
T:0..1200
Exon E1
G:1000..1200
T:0..200
Exon E2
G:3000..3500
T:200..700
Exon E3
G:4500..5000
T:700..1200
Genometry Annotations – All coordinates are
relative to BioSeqs
Transcript T
Genome G
TranscriptAnnot T1
G: 1000..5000
T:0..1200
ExonAnnot E1
G:1000..1200
T:0..200
ExonAnnot E2
G:3000..3500
T:200..700
ExonAnnot E3
G:4500..5000
T:700..1200
Transcript T
Genome G
Genometry Annotations – SeqSpans encapsulate a
range along a BioSeq
Transcript T
Genome G
TranscriptAnnot T1
ExonAnnot E1 ExonAnnot E2 ExonAnnot E3
Transcript T
Genome G
G: 1000..5000
T: 0..200
G:1000..1200
T:0..200
G:3000..3500
T:200..700
G:4500..5000
T:700..1200
Genometry Core Core
•  BioSeq
–  length, residues (optional)
•  SeqSpan
–  start, end, BioSeq
•  SeqSymmetry
–  SeqSpans (breadth)
–  SeqSymmetry parent / child hierarchy (depth)
Expressiveness of Core Core
•  “Standard” annotations
•  Singleton annotations
•  Alternative Splicing
•  Pairwise alignments
•  Annotations with depth > 2
•  Annotations with breadth > 2
•  Indels
•  Structure of analyzed sequence
•  Fuzzy locations
•  All without explicit pointers from BioSeq to annotation
Genometry Modelling of Insertions and Deletions #1a
G:1000..1006
T:7..18
G:1000..1017
T:0..6
G:1006..1017
T:0..18
…AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG…
GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG
G:2000..2017
T:18..34
G:2000..2010
T:28..34T:18..28
G:2011..2017
G:1000..2017
T:0..34
insertion in transcript relative to genome
(deletion in genome relative to transcript)
deletion in transcript relative to genome
(insertion in genome relative to transcript)
Genome G
Transcript T
Genometry Modelling of Insertions and Deletions #1b
G: g0..g2
T:t0..t2
…AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG…
GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG
G:g3..g5
T:t3..t5
G:g3..g4
T:t4..t5T:t3..t4
G:g4+1..g5G:g0..g1
T:t0..t1 T:t1+1..t2
G:g1..g2
G:g0..g5
T:t0..t5
insertion in transcript relative to genome
(deletion in genome relative to transcript)
deletion in transcript relative to genome
(insertion in genome relative to transcript)
Genome G
Transcript T
t0 t1 t1+1 t2
g0 g1 g2 g3 g4 g4+1 g5
t3 t4 t5
Genometry Modelling of Insertions and Deletions #2
G:g0..g1
T:t0..t1 T:t1+1..t2
G:g1..g2
G: g0..g2
T:t0..t2
…AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG…
GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG
G:g3..g5
T:t3..t5
G:g3..g4
T:t3..t4 T:t4..t5
G:g4+1..g5
G:g0..g5
T:t0..t5
insertion in transcript relative to genome
(deletion in genome relative to transcript)
deletion in transcript relative to genome
(insertion in genome relative to transcript)
Genome G
Transcript T
T:t1..t1+1
“C” :0..1
t0 t1 t1+1 t2
g0 g1 g2 g3 g4 g4+1 g5
t3 t4 t5
G:g4..g4+1
“G” :0..1
Genometry Modelling of Insertions and Deletions #3
G:g0..g1
T:t0..t1 T:t1+1..t2
G:g1..g2
G: g0..g2
T:t0..t2
…AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG…
GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG
G:g3..g5
T:t3..t5
G:g3..g4
T:t3..t4 T:t4..t5
G:g4+1..g5
G:g0..g5
T:t0..t5
insertion in transcript relative to genome
(deletion in genome relative to transcript)
deletion in transcript relative to genome
(insertion in genome relative to transcript)
Genome G
Transcript T
T:t1..t1+1
G:g1..g1
t0 t1 t1+1 t2
g0 g1 g2 g3 g4 g4+1 g5
t3 t4 t5
G:g4..g4+1
T:t4..t4
Genometry Modelling of Insertions and Deletions #4
G:g0..g1
T:t0..t1 T:t1+1..t2
G:g1..g2
G: g0..g2
T:t0..t2
…AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG…
GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG
G:g3..g5
T:t3..t5
G:g3..g4
T:t3..t4 T:t4..t5
G:g4+1..g5
G:g0..g5
T:t0..t5
insertion in transcript relative to genome
(deletion in genome relative to transcript)
deletion in transcript relative to genome
(insertion in genome relative to transcript)
Genome G
Transcript T
t0 t1 t1+1 t2
g0 g1 g2 g3 g4 g4+1 g5
t3 t4 t5
T:t1..t1+1
G:g1..g1
“C”:0..1
T:t4..t4
G:g4..g4+1
“G”:0..1
Modelling SNPs with Genometry: Two Approaches
SeqB : 0..n
SeqA : 0..x
SeqB : 0..x
“T” : 0..1
SeqB : x..x+1
SeqA : 0..m
SeqA : x+1..m
SeqB : x+1..n
SeqA : x..x+1…GGCAAGGAATGATC…SeqA
x x+1
…GGCAAGGAATGATC…SeqA
SeqB …GGCAAGTAATGATC…
x x+1
SeqA = reference chromosome
SeqB = exactly same as reference chromosome, except for one SNP
I. SNPs as annotations of differences
between sequences
II. SNPs as gaps in similarity between two sequences
T
SeqB : x..x+1
SeqA : x..x+1…GGCAAGGAATGATC…SeqA
SeqB …GGCAAGTAATGATC…
x x+1
“T” : 0..1
SeqA : x..x+1…GGCAAGGAATGATC…SeqA
T
x x+1
I.a. annotation of just reference seq
I.b. annotation of reference seq w/ variant base
I.c. annotation of reference and variant seq
Modelling SNPs with Genometry: Two Approaches
SeqB : 0..n
SeqA : 0..x
SeqB : 0..x
“T” : 0..1
SeqB : x..x+1
SeqA : 0..m
SeqA : x+1..m
SeqB : x+1..n
SeqA : x..x+1…GGCAAGGAATGATC…SeqA
x x+1
…GGCAAGGAATGATC…SeqA
SeqB …GGCAAGTAATGATC…
x x+1
SeqA = reference chromosome
SeqB = exactly same as reference chromosome, except for one SNP
I. SNPs as annotations of differences
between sequences
II. SNPs as gaps in similarity between two sequences
T
SeqB : x..x+1
SeqA : x..x+1…GGCAAGGAATGATC…SeqA
SeqB …GGCAAGTAATGATC…
x x+1
“T” : 0..1
SeqA : x..x+1…GGCAAGGAATGATC…SeqA
T
x x+1
I.a. annotation of just reference seq
I.b. annotation of reference seq w/ variant base
I.c. annotation of reference and variant seq
Sequence-oriented annotations
•  AnnotatedBioSeq
–  Contains a collection of SeqSymmetries that annotate the
sequence
–  Interfaces to retrieve annotations covered by a span within the
sequence
Annotation Networks
•  Can traverse networks of annotations, alternating between
AnnotatedBioSeqs and SeqSymmetries
protein2mRNA
proteinSpanB
mrnaSpanB
mRNA2genomic
genomicSpanC
mrnaSpanC
Annotated
GenomicSeq G
Annotated
mRNASeq M
Annotated
ProteinSeq P
m2gSub0
gSpanC0
mSpanC0
m2gSub1
gSpanC1
mSpanC1
m2gSub2
gSpanC2
mSpanC2
domainOnProtein
proteinSpanA
= AnnotatedBioSeq
= SeqSymmetry
Sequence Composition
•  CompositeBioSeq
– Contains a SeqSymmetry describing the mapping
of BioSeqs used in composition to the
CompositeBioSeq itself
Sequence Composition Representations
•  Sequence Assembly / Golden Path / etc.
•  Piecewise data loading / lazy data loading
•  Genotypes
•  Chromosomal Rearrangements
•  Primer construction
•  Reverse Complement
•  Coordinate Shifting
Genometry Modelling of Reverse Complement
Sequence B = reverse complement of Sequence A
BioSeq A
length: x
Composite
BioSeq B
length: x
A:0..x
B:x..0
Sym AB
composition
AGGCAATTAATTGATCCAGGTGGAGTCCGAATAGGGTTAGCGA
TCGCTAACCCTATTCGGACTCCACCTGGATCAATTAATTGCCT
SeqA
SeqB
MultiSequence Alignments
•  MultiSeqAlignment
–  Alignments sliced “horizontally” -- each “row” in an alignment is a
CompositeBioSeq whose composition maps another BioSeq to the same
coord space as the alignment
•  Can also slice vertically (synteny)
Alignment Representations
•  Can represent same alignment as either MultiSeqAlignment or Synteny
•  Transformation from horizontal slicing (MultiSeqAlignment) to vertical
slicing (Synteny)
Complete Genometry Core Models
•  Mutability
•  Curations
Genometry Manipulations
•  Symmetry Intersection (AND)
•  Symmetry Union (OR)
•  Symmetry Inverse (NOT)
•  Symmetry Mutual Exclusion (XOR)
•  Symmetry Transformation / Mapping
Symmetry Combination Operations
SymA
SymB
XOR(A, B)
AND(A, B)
OR(A, B)
NOT(A)
NOT(B)
Genometry Transformations
•  Every symmetry of breadth > 1 describes a mapping
between different sequences
•  Therefore every symmetry can be used to transform
coordinates of other symmetries from one sequence
to another
•  Because sequence annotations, alignments, and
composition are all based on symmetries, can use
any of them as mappings
•  Discontiguous linear mapping algorithm
•  Results of transformation are also symmetries
Coordinate
Mapping
(note that domain mapped to spliced transcript only overlaps two of the three exons,
hence only end up with two children for resulting domain2genomic symmetry)
Example – mapping domain from protein coords to genomic coords
protein2mRNA
proteinSpanB
mrnaSpanB
mRNA2genomic
genomicSpanC
mrnaSpanC
Annotated
GenomicSeq G
Annotated
mRNASeq M
Annotated
ProteinSeq P
m2gSub0
gSpanC0
mSpanC0
domain2genomic
proteinSpanA
d2gSub0
pSpanA0
mSpanA0
gSpanA0
domain2genomic
proteinSpanA
mrnaSpanA
domain2genomic
proteinSpanA
mrnaSpanA
genomicSpanA
d2gSub1
pSpanA1
mSpanA1
gSpanA1
transform via
protein2mRNA
transform via
mRNA2genomic
m2gSub1
gSpanC1
mSpanC1
m2gSub2
gSpanC2
mSpanC2
domainOnProtein
proteinSpanA
= AnnotatedBioSeq
(BioSeq)
= SeqSymmetry
(SeqAnnot)
“Growing” domain2genomic result
= MutableSeqSymmetry
mRNA2genomic
genomicSpanC
mrnaSpanC
m2gSub0
gSpanC0
mSpanC0
m2gSub1
gSpanC1
mSpanC1
m2gSub2
gSpanC2
mSpanC2
domain2genomic
proteinSpanA
mrnaSpanA
domain2genomic
proteinSpanA
mrnaSpanA
d2gSub0
mSpanA0
domain2genomic
proteinSpanA
mrnaSpanA
d2gSub0
mSpanA0
pSpanA0
domain2genomic
proteinSpanA
mrnaSpanA
d2gSub0
mSpanA0
pSpanA0
gSpanA0
d2gSub0
pSpanA0
mSpanA0
gSpanA0
domain2genomic
proteinSpanA
mrnaSpanA
genomicSpanA
d2gSub1
pSpanA1
mSpanA1
gSpanA1
domain2genomic
proteinSpanA
mrnaSpanA
d2gSub0
mSpanA0
pSpanA0
gSpanA0
d2gSub1
mSpanA1
pSpanA1
gSpanA1
step1b step1cstep1a
step 2
step1
(loop2)
[a,b,c]
Step 2
“roll up”
Step 1a
“sit still”
Step1b
“roll back”
Step1c
“roll forward”
Step 1
Details of “split” mapping
Transformations Applications
•  Mapping Affy probes to genome
•  Mapping contig annotations to larger genomic assemblies
•  Mapping protein annotations to genome
•  Mapping genomic annotations to proteins and transcripts
(SNPs, for example)
•  Sequence slice-and-dice with annotation propagation
•  Propagation of annotations across versioned sequences (such
as Golden Path)
•  Deep mappings (for example, SNP to genomeA to transcriptB to
proteinC to homolog proteinD to transcriptE to genomeF to
putative SNP location in genomeF – symmetry path of depth 5)
•  Etc., etc.
Prototypes & Applications
•  GenometryTest
•  Generic Genometry Viewer
•  ProtAnnot (Ann)
•  GPView (Cyrus)
•  AlignView (Eric)
•  ContigViewer (Peter, Barry)
•  Unibrow (Transcriptome Group)
Genometry Summary
•  Genometry presents a unified model for
location-based sequence relationships
•  Sequence annotation, composition, and
alignment are all based on SeqSymmetry
•  Provides powerful genometry manipulations --
any SeqSymmetry can be used to map other
SeqSymmetries across sequences /
coordinate spaces
•  Work in progress

Más contenido relacionado

Similar a IGB genome genometry data models by Gregg Helt and Cyrus Harmon

RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingErnesto Jimenez Ruiz
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentationaustinps
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Metagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and ApplicationsMetagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and ApplicationsFabio Gori
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSAksw Group
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Sheng Wang
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsFrancis Rowland
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 

Similar a IGB genome genometry data models by Gregg Helt and Cyrus Harmon (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology Matching
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
Metagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and ApplicationsMetagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and Applications
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
 
02-alignment.pdf
02-alignment.pdf02-alignment.pdf
02-alignment.pdf
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Exome Sequencing
Exome SequencingExome Sequencing
Exome Sequencing
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in Genomics
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 

Más de Ann Loraine

Use Integrated Genome Browser to explore, analyze, and publish genomic data
Use Integrated Genome Browser to explore, analyze, and publish genomic dataUse Integrated Genome Browser to explore, analyze, and publish genomic data
Use Integrated Genome Browser to explore, analyze, and publish genomic dataAnn Loraine
 
Visualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome BrowserVisualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome BrowserAnn Loraine
 
BINF 3121 Data Analysis Report How-To
BINF 3121 Data Analysis Report How-ToBINF 3121 Data Analysis Report How-To
BINF 3121 Data Analysis Report How-ToAnn Loraine
 
Giving great talks in Bioinformatics - from Professional Communication class ...
Giving great talks in Bioinformatics - from Professional Communication class ...Giving great talks in Bioinformatics - from Professional Communication class ...
Giving great talks in Bioinformatics - from Professional Communication class ...Ann Loraine
 
Interviewing - why some questions are off limits
Interviewing - why some questions are off limitsInterviewing - why some questions are off limits
Interviewing - why some questions are off limitsAnn Loraine
 
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and RipeningRNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and RipeningAnn Loraine
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Ann Loraine
 
Visualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotationsVisualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotationsAnn Loraine
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitrWiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitrAnn Loraine
 
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological Interpretation
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological InterpretationRNA-Seq data analysis at wings 2014 - Workshop 3 Biological Interpretation
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological InterpretationAnn Loraine
 
Linking IGB with Galaxy
Linking IGB with GalaxyLinking IGB with Galaxy
Linking IGB with GalaxyAnn Loraine
 
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...Ann Loraine
 

Más de Ann Loraine (14)

Use Integrated Genome Browser to explore, analyze, and publish genomic data
Use Integrated Genome Browser to explore, analyze, and publish genomic dataUse Integrated Genome Browser to explore, analyze, and publish genomic data
Use Integrated Genome Browser to explore, analyze, and publish genomic data
 
Visualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome BrowserVisualize genomes with Integrated Genome Browser
Visualize genomes with Integrated Genome Browser
 
BINF 3121 Data Analysis Report How-To
BINF 3121 Data Analysis Report How-ToBINF 3121 Data Analysis Report How-To
BINF 3121 Data Analysis Report How-To
 
Giving great talks in Bioinformatics - from Professional Communication class ...
Giving great talks in Bioinformatics - from Professional Communication class ...Giving great talks in Bioinformatics - from Professional Communication class ...
Giving great talks in Bioinformatics - from Professional Communication class ...
 
Interviewing - why some questions are off limits
Interviewing - why some questions are off limitsInterviewing - why some questions are off limits
Interviewing - why some questions are off limits
 
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and RipeningRNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016
 
Em pcr 16x9
Em pcr 16x9Em pcr 16x9
Em pcr 16x9
 
Visualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotationsVisualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotations
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitrWiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
 
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological Interpretation
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological InterpretationRNA-Seq data analysis at wings 2014 - Workshop 3 Biological Interpretation
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological Interpretation
 
Linking IGB with Galaxy
Linking IGB with GalaxyLinking IGB with Galaxy
Linking IGB with Galaxy
 
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
 

Último

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 

Último (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 

IGB genome genometry data models by Gregg Helt and Cyrus Harmon

  • 2. Genometry •  Motivation and Purpose •  Points of Reference •  Genometry interfaces •  Genometry manipulations •  Genometry implementation •  Representation examples •  Prototype apps •  Current status, future work
  • 3. Motivation and Goals •  Desire for a more unified data model to represent relationships between biological sequences, such as: –  Annotations –  Alignments –  Sequence composition •  More networked, less hierarchical (genome-centric, transcript-centric) •  Simplicity •  Expressivity / Flexibility •  Memory and Computational Efficiency •  Use by others to provide core functionality for various Affy projects
  • 4. Points of Reference •  com.neomorphic.bio models •  Genisys DB and Genisys IDL •  EBI mapping models •  Apollo data models •  BioPerl •  BioJava •  Closest similarity to bio alignment models and Genisys alignment models
  • 5. Basic Annotations Transcript T Genome G Transcript T G: 1000..5000 Exon E1 G:1000..1200 Exon E2 G:3000..3500 Exon E3 G:4500..5000
  • 6. Genometry Annotations – Specify All Coordinates Transcript T Genome G Transcript T G: 1000..5000 T:0..1200 Exon E1 G:1000..1200 T:0..200 Exon E2 G:3000..3500 T:200..700 Exon E3 G:4500..5000 T:700..1200
  • 7. Genometry Annotations – All coordinates are relative to BioSeqs Transcript T Genome G TranscriptAnnot T1 G: 1000..5000 T:0..1200 ExonAnnot E1 G:1000..1200 T:0..200 ExonAnnot E2 G:3000..3500 T:200..700 ExonAnnot E3 G:4500..5000 T:700..1200 Transcript T Genome G
  • 8. Genometry Annotations – SeqSpans encapsulate a range along a BioSeq Transcript T Genome G TranscriptAnnot T1 ExonAnnot E1 ExonAnnot E2 ExonAnnot E3 Transcript T Genome G G: 1000..5000 T: 0..200 G:1000..1200 T:0..200 G:3000..3500 T:200..700 G:4500..5000 T:700..1200
  • 9. Genometry Core Core •  BioSeq –  length, residues (optional) •  SeqSpan –  start, end, BioSeq •  SeqSymmetry –  SeqSpans (breadth) –  SeqSymmetry parent / child hierarchy (depth)
  • 10. Expressiveness of Core Core •  “Standard” annotations •  Singleton annotations •  Alternative Splicing •  Pairwise alignments •  Annotations with depth > 2 •  Annotations with breadth > 2 •  Indels •  Structure of analyzed sequence •  Fuzzy locations •  All without explicit pointers from BioSeq to annotation
  • 11. Genometry Modelling of Insertions and Deletions #1a G:1000..1006 T:7..18 G:1000..1017 T:0..6 G:1006..1017 T:0..18 …AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG… GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG G:2000..2017 T:18..34 G:2000..2010 T:28..34T:18..28 G:2011..2017 G:1000..2017 T:0..34 insertion in transcript relative to genome (deletion in genome relative to transcript) deletion in transcript relative to genome (insertion in genome relative to transcript) Genome G Transcript T
  • 12. Genometry Modelling of Insertions and Deletions #1b G: g0..g2 T:t0..t2 …AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG… GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG G:g3..g5 T:t3..t5 G:g3..g4 T:t4..t5T:t3..t4 G:g4+1..g5G:g0..g1 T:t0..t1 T:t1+1..t2 G:g1..g2 G:g0..g5 T:t0..t5 insertion in transcript relative to genome (deletion in genome relative to transcript) deletion in transcript relative to genome (insertion in genome relative to transcript) Genome G Transcript T t0 t1 t1+1 t2 g0 g1 g2 g3 g4 g4+1 g5 t3 t4 t5
  • 13. Genometry Modelling of Insertions and Deletions #2 G:g0..g1 T:t0..t1 T:t1+1..t2 G:g1..g2 G: g0..g2 T:t0..t2 …AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG… GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG G:g3..g5 T:t3..t5 G:g3..g4 T:t3..t4 T:t4..t5 G:g4+1..g5 G:g0..g5 T:t0..t5 insertion in transcript relative to genome (deletion in genome relative to transcript) deletion in transcript relative to genome (insertion in genome relative to transcript) Genome G Transcript T T:t1..t1+1 “C” :0..1 t0 t1 t1+1 t2 g0 g1 g2 g3 g4 g4+1 g5 t3 t4 t5 G:g4..g4+1 “G” :0..1
  • 14. Genometry Modelling of Insertions and Deletions #3 G:g0..g1 T:t0..t1 T:t1+1..t2 G:g1..g2 G: g0..g2 T:t0..t2 …AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG… GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG G:g3..g5 T:t3..t5 G:g3..g4 T:t3..t4 T:t4..t5 G:g4+1..g5 G:g0..g5 T:t0..t5 insertion in transcript relative to genome (deletion in genome relative to transcript) deletion in transcript relative to genome (insertion in genome relative to transcript) Genome G Transcript T T:t1..t1+1 G:g1..g1 t0 t1 t1+1 t2 g0 g1 g2 g3 g4 g4+1 g5 t3 t4 t5 G:g4..g4+1 T:t4..t4
  • 15. Genometry Modelling of Insertions and Deletions #4 G:g0..g1 T:t0..t1 T:t1+1..t2 G:g1..g2 G: g0..g2 T:t0..t2 …AGGCAATTAATTGATCCAGGTG……GAGTCCGAATAGGGTTAGCG… GCAATTCAATTGATCCAG TCCGAATAGGTTAGCG G:g3..g5 T:t3..t5 G:g3..g4 T:t3..t4 T:t4..t5 G:g4+1..g5 G:g0..g5 T:t0..t5 insertion in transcript relative to genome (deletion in genome relative to transcript) deletion in transcript relative to genome (insertion in genome relative to transcript) Genome G Transcript T t0 t1 t1+1 t2 g0 g1 g2 g3 g4 g4+1 g5 t3 t4 t5 T:t1..t1+1 G:g1..g1 “C”:0..1 T:t4..t4 G:g4..g4+1 “G”:0..1
  • 16. Modelling SNPs with Genometry: Two Approaches SeqB : 0..n SeqA : 0..x SeqB : 0..x “T” : 0..1 SeqB : x..x+1 SeqA : 0..m SeqA : x+1..m SeqB : x+1..n SeqA : x..x+1…GGCAAGGAATGATC…SeqA x x+1 …GGCAAGGAATGATC…SeqA SeqB …GGCAAGTAATGATC… x x+1 SeqA = reference chromosome SeqB = exactly same as reference chromosome, except for one SNP I. SNPs as annotations of differences between sequences II. SNPs as gaps in similarity between two sequences T SeqB : x..x+1 SeqA : x..x+1…GGCAAGGAATGATC…SeqA SeqB …GGCAAGTAATGATC… x x+1 “T” : 0..1 SeqA : x..x+1…GGCAAGGAATGATC…SeqA T x x+1 I.a. annotation of just reference seq I.b. annotation of reference seq w/ variant base I.c. annotation of reference and variant seq
  • 17. Modelling SNPs with Genometry: Two Approaches SeqB : 0..n SeqA : 0..x SeqB : 0..x “T” : 0..1 SeqB : x..x+1 SeqA : 0..m SeqA : x+1..m SeqB : x+1..n SeqA : x..x+1…GGCAAGGAATGATC…SeqA x x+1 …GGCAAGGAATGATC…SeqA SeqB …GGCAAGTAATGATC… x x+1 SeqA = reference chromosome SeqB = exactly same as reference chromosome, except for one SNP I. SNPs as annotations of differences between sequences II. SNPs as gaps in similarity between two sequences T SeqB : x..x+1 SeqA : x..x+1…GGCAAGGAATGATC…SeqA SeqB …GGCAAGTAATGATC… x x+1 “T” : 0..1 SeqA : x..x+1…GGCAAGGAATGATC…SeqA T x x+1 I.a. annotation of just reference seq I.b. annotation of reference seq w/ variant base I.c. annotation of reference and variant seq
  • 18. Sequence-oriented annotations •  AnnotatedBioSeq –  Contains a collection of SeqSymmetries that annotate the sequence –  Interfaces to retrieve annotations covered by a span within the sequence
  • 19. Annotation Networks •  Can traverse networks of annotations, alternating between AnnotatedBioSeqs and SeqSymmetries protein2mRNA proteinSpanB mrnaSpanB mRNA2genomic genomicSpanC mrnaSpanC Annotated GenomicSeq G Annotated mRNASeq M Annotated ProteinSeq P m2gSub0 gSpanC0 mSpanC0 m2gSub1 gSpanC1 mSpanC1 m2gSub2 gSpanC2 mSpanC2 domainOnProtein proteinSpanA = AnnotatedBioSeq = SeqSymmetry
  • 20. Sequence Composition •  CompositeBioSeq – Contains a SeqSymmetry describing the mapping of BioSeqs used in composition to the CompositeBioSeq itself
  • 21. Sequence Composition Representations •  Sequence Assembly / Golden Path / etc. •  Piecewise data loading / lazy data loading •  Genotypes •  Chromosomal Rearrangements •  Primer construction •  Reverse Complement •  Coordinate Shifting
  • 22. Genometry Modelling of Reverse Complement Sequence B = reverse complement of Sequence A BioSeq A length: x Composite BioSeq B length: x A:0..x B:x..0 Sym AB composition AGGCAATTAATTGATCCAGGTGGAGTCCGAATAGGGTTAGCGA TCGCTAACCCTATTCGGACTCCACCTGGATCAATTAATTGCCT SeqA SeqB
  • 23. MultiSequence Alignments •  MultiSeqAlignment –  Alignments sliced “horizontally” -- each “row” in an alignment is a CompositeBioSeq whose composition maps another BioSeq to the same coord space as the alignment •  Can also slice vertically (synteny)
  • 24. Alignment Representations •  Can represent same alignment as either MultiSeqAlignment or Synteny •  Transformation from horizontal slicing (MultiSeqAlignment) to vertical slicing (Synteny)
  • 25. Complete Genometry Core Models •  Mutability •  Curations
  • 26. Genometry Manipulations •  Symmetry Intersection (AND) •  Symmetry Union (OR) •  Symmetry Inverse (NOT) •  Symmetry Mutual Exclusion (XOR) •  Symmetry Transformation / Mapping
  • 27. Symmetry Combination Operations SymA SymB XOR(A, B) AND(A, B) OR(A, B) NOT(A) NOT(B)
  • 28. Genometry Transformations •  Every symmetry of breadth > 1 describes a mapping between different sequences •  Therefore every symmetry can be used to transform coordinates of other symmetries from one sequence to another •  Because sequence annotations, alignments, and composition are all based on symmetries, can use any of them as mappings •  Discontiguous linear mapping algorithm •  Results of transformation are also symmetries
  • 29. Coordinate Mapping (note that domain mapped to spliced transcript only overlaps two of the three exons, hence only end up with two children for resulting domain2genomic symmetry) Example – mapping domain from protein coords to genomic coords protein2mRNA proteinSpanB mrnaSpanB mRNA2genomic genomicSpanC mrnaSpanC Annotated GenomicSeq G Annotated mRNASeq M Annotated ProteinSeq P m2gSub0 gSpanC0 mSpanC0 domain2genomic proteinSpanA d2gSub0 pSpanA0 mSpanA0 gSpanA0 domain2genomic proteinSpanA mrnaSpanA domain2genomic proteinSpanA mrnaSpanA genomicSpanA d2gSub1 pSpanA1 mSpanA1 gSpanA1 transform via protein2mRNA transform via mRNA2genomic m2gSub1 gSpanC1 mSpanC1 m2gSub2 gSpanC2 mSpanC2 domainOnProtein proteinSpanA = AnnotatedBioSeq (BioSeq) = SeqSymmetry (SeqAnnot) “Growing” domain2genomic result = MutableSeqSymmetry
  • 30.
  • 31.
  • 33.
  • 34. Transformations Applications •  Mapping Affy probes to genome •  Mapping contig annotations to larger genomic assemblies •  Mapping protein annotations to genome •  Mapping genomic annotations to proteins and transcripts (SNPs, for example) •  Sequence slice-and-dice with annotation propagation •  Propagation of annotations across versioned sequences (such as Golden Path) •  Deep mappings (for example, SNP to genomeA to transcriptB to proteinC to homolog proteinD to transcriptE to genomeF to putative SNP location in genomeF – symmetry path of depth 5) •  Etc., etc.
  • 35. Prototypes & Applications •  GenometryTest •  Generic Genometry Viewer •  ProtAnnot (Ann) •  GPView (Cyrus) •  AlignView (Eric) •  ContigViewer (Peter, Barry) •  Unibrow (Transcriptome Group)
  • 36. Genometry Summary •  Genometry presents a unified model for location-based sequence relationships •  Sequence annotation, composition, and alignment are all based on SeqSymmetry •  Provides powerful genometry manipulations -- any SeqSymmetry can be used to map other SeqSymmetries across sequences / coordinate spaces •  Work in progress