SlideShare a Scribd company logo
1 of 14
Dot plots

                    Dr Avril Coghlan
                   alc@sanger.ac.uk

Note: this talk contains animations which can only be seen by
downloading and using ‘View Slide show’ in Powerpoint
Dot plots
• How can we compare the human & Drosophila
  melanogaster Eyeless protein sequences?
  One method is a dotplot
• A dotplot is a graphical method for assessing
  similarity
  Make a matrix (table) with one row for each letter in sequence 1, & one
       column for each letter in sequence 2
  Colour in each cell with an identical letter in the 2 sequences
  Regions of local similarity between the 2 sequences appear as diagonal
       lines of coloured cells (‘dots’)
eg. for sequences ‘RQQEPVRSTC’ and ‘QQESGPVRST’:

                   Q   Q    E   S   G    P   V    R   S   T          Sequence 2
               R
               Q
               Q
               E
Sequence 1
               P
               V
               R
               S
               T
               C

     Regions of local similarity between the 2 sequences appear as
     diagonal lines
     Some off-diagonal dots may be due to chance similarities
Problem
• Make a dot-plot for DNA sequences “GCATCGGC” &
  “CCATCGCCATCG”. Are there regions of similarity?
Answer
• Make a dot-plot for DNA sequences “GCATCGGC” &
  “CCATCGCCATCG”. Are there regions of similarity?
       C    C   A   T   C   G    C   C   A   T      C   G
   G
   C
   A
   T
   C
   G
   G
   C

  CATCG in sequence 1 appears twice in sequence 2
Dot plots with thresholds
• If you colour in all cells with an identical letter, some
  dots may be due to chance similarities
• Therefore, it is common to use a threshold to decide
  whether to plot a ‘dot’ in a cell
  A window of a certain size (eg. window size = 3) is moved up all possible
        diagonals, one-by-one
  A score is calculated for each position of the window on a diagonal :
        the number of identical letters in the window
  If the score is equal to or above the threshold (eg. threshold = score of
        2), all the cells in the window are coloured in
  The choice of values for the window size and threshold for the dot plot
        are chosen by trial-and-error
eg. for sequences “GCATCGGC” and “CCATCGCCATCG” , using a window
      size of 3, and a threshold of ≥2:


          C   C   A   T   C   G   C   C     A    T   C    G
      G
      C
      A
      T
      C
      G
      G
      C

          Score = 2, ≥ threshold → colour in
                  3, <
                  0,
                  1,

  = the sliding window                    and so on....
Real data: fruitfly & human Eyeless
• A dot plot of fruitfly & human Eyeless proteins:
        Fruitfly Eyeless



                                           Window-size = 10,
                                           Threshold = 3




                           Human Eyeless
  Do you think we chose a good value for the
  window-size and threshold?
Real data: fruitfly & human Eyeless
• Here is a dot plot of fruitfly and human Eyeless
  proteins, made using windowsize=10, threshold=5:
     Fruitfly Eyeless




                                         Window-size = 10,
                                         Threshold = 5




                        Human Eyeless
  Are there any regions of similarity?
Pros and cons of dot plots
• Advantages
  A dot plot can be used to identify long regions of strong similarity
  between two sequences
  It produces a plot, which is easy to make and to interpret
  It can be used to compare very short or long sequences (even whole
        chromosomes – millions of bases)
• Disadvantages
  It is necessary to find the best window size and threshold by trial-and-
  error
  A dot plot can only be used to compare 2 sequences, not >2 sequences
  It doesn’t tell you what mutations occurred in the region of
  similarity (if there is one) since the two sequences shared a
  common ancestor
Software for making dotplots
• dotPlot() function in the SeqinR R library
  Allows you to specify a windowsize and threshold
  If the score in a window is ≥ than the threshold, colours in the 1st cell in
        the window (not all cells)
• EMBOSS dottup
  Allows you to specify a windowsize but not a threshold
  If all cells in a window are identities, it colours in all cells in the window
• EMBOSS dotmatcher
  Allows you to specify a windowsize and threshold
  Instead of using the number of identities in a window as the window
        score, it calculates a more complex score based on the
  similarities of the bases/amino acids
Problem
• Make a dot-plot for amino acid sequences
  “RQQEPVRSTC” and “QQESGPVRST”, using a
  window size of 3, and a threshold of ≥3
Answer
•   Make a dot-plot for sequences “RQQEPVRSTC” and “QQESGPVRST”,
    using window size: 3, threshold: ≥3

                Q   Q   E   S   G   P   V   R   S   T
            R
            Q
            Q
            E
            P
            V
            R
            S
            T
            C
Further reading
•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn
•   Practical on dotplots in R in the Little Book of R for Bioinformatics:
    https://a-little-book-of-r-for-
    bioinformatics.readthedocs.org/en/latest/src/chapter4.html

More Related Content

What's hot

Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 

What's hot (20)

Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Blast
BlastBlast
Blast
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Scoring schemes in bioinformatics
Scoring schemes in bioinformaticsScoring schemes in bioinformatics
Scoring schemes in bioinformatics
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Biological databases
Biological databasesBiological databases
Biological databases
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Protein database
Protein databaseProtein database
Protein database
 
Fasta
FastaFasta
Fasta
 
Swiss PROT
Swiss PROT Swiss PROT
Swiss PROT
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
BLAST
BLASTBLAST
BLAST
 

Similar to Dotplots for Bioinformatics

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
zukun
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Suhas Pillai
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
Scalable membership management
Scalable membership management Scalable membership management
Scalable membership management
Vinay Setty
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
c.titus.brown
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 

Similar to Dotplots for Bioinformatics (20)

Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Scalable membership management
Scalable membership management Scalable membership management
Scalable membership management
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
DOT MATRIX DOT MATRIX DOT MATRIX DOT MATRIX
DOT MATRIX DOT MATRIX DOT MATRIX DOT MATRIXDOT MATRIX DOT MATRIX DOT MATRIX DOT MATRIX
DOT MATRIX DOT MATRIX DOT MATRIX DOT MATRIX
 
Indexing Text with Approximate q-grams
Indexing Text with Approximate q-gramsIndexing Text with Approximate q-grams
Indexing Text with Approximate q-grams
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
 
Significant scales in community structure
Significant scales in community structureSignificant scales in community structure
Significant scales in community structure
 
Word2vec and Friends
Word2vec and FriendsWord2vec and Friends
Word2vec and Friends
 
De bruijn graphs
De bruijn graphsDe bruijn graphs
De bruijn graphs
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 

More from avrilcoghlan

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
avrilcoghlan
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
avrilcoghlan
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
avrilcoghlan
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
avrilcoghlan
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
avrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
avrilcoghlan
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
avrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
avrilcoghlan
 

More from avrilcoghlan (10)

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Homology
HomologyHomology
Homology
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
BLAST
BLASTBLAST
BLAST
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 

Dotplots for Bioinformatics

  • 1. Dot plots Dr Avril Coghlan alc@sanger.ac.uk Note: this talk contains animations which can only be seen by downloading and using ‘View Slide show’ in Powerpoint
  • 2. Dot plots • How can we compare the human & Drosophila melanogaster Eyeless protein sequences? One method is a dotplot • A dotplot is a graphical method for assessing similarity Make a matrix (table) with one row for each letter in sequence 1, & one column for each letter in sequence 2 Colour in each cell with an identical letter in the 2 sequences Regions of local similarity between the 2 sequences appear as diagonal lines of coloured cells (‘dots’)
  • 3. eg. for sequences ‘RQQEPVRSTC’ and ‘QQESGPVRST’: Q Q E S G P V R S T Sequence 2 R Q Q E Sequence 1 P V R S T C Regions of local similarity between the 2 sequences appear as diagonal lines Some off-diagonal dots may be due to chance similarities
  • 4. Problem • Make a dot-plot for DNA sequences “GCATCGGC” & “CCATCGCCATCG”. Are there regions of similarity?
  • 5. Answer • Make a dot-plot for DNA sequences “GCATCGGC” & “CCATCGCCATCG”. Are there regions of similarity? C C A T C G C C A T C G G C A T C G G C CATCG in sequence 1 appears twice in sequence 2
  • 6. Dot plots with thresholds • If you colour in all cells with an identical letter, some dots may be due to chance similarities • Therefore, it is common to use a threshold to decide whether to plot a ‘dot’ in a cell A window of a certain size (eg. window size = 3) is moved up all possible diagonals, one-by-one A score is calculated for each position of the window on a diagonal : the number of identical letters in the window If the score is equal to or above the threshold (eg. threshold = score of 2), all the cells in the window are coloured in The choice of values for the window size and threshold for the dot plot are chosen by trial-and-error
  • 7. eg. for sequences “GCATCGGC” and “CCATCGCCATCG” , using a window size of 3, and a threshold of ≥2: C C A T C G C C A T C G G C A T C G G C Score = 2, ≥ threshold → colour in 3, < 0, 1, = the sliding window and so on....
  • 8. Real data: fruitfly & human Eyeless • A dot plot of fruitfly & human Eyeless proteins: Fruitfly Eyeless Window-size = 10, Threshold = 3 Human Eyeless Do you think we chose a good value for the window-size and threshold?
  • 9. Real data: fruitfly & human Eyeless • Here is a dot plot of fruitfly and human Eyeless proteins, made using windowsize=10, threshold=5: Fruitfly Eyeless Window-size = 10, Threshold = 5 Human Eyeless Are there any regions of similarity?
  • 10. Pros and cons of dot plots • Advantages A dot plot can be used to identify long regions of strong similarity between two sequences It produces a plot, which is easy to make and to interpret It can be used to compare very short or long sequences (even whole chromosomes – millions of bases) • Disadvantages It is necessary to find the best window size and threshold by trial-and- error A dot plot can only be used to compare 2 sequences, not >2 sequences It doesn’t tell you what mutations occurred in the region of similarity (if there is one) since the two sequences shared a common ancestor
  • 11. Software for making dotplots • dotPlot() function in the SeqinR R library Allows you to specify a windowsize and threshold If the score in a window is ≥ than the threshold, colours in the 1st cell in the window (not all cells) • EMBOSS dottup Allows you to specify a windowsize but not a threshold If all cells in a window are identities, it colours in all cells in the window • EMBOSS dotmatcher Allows you to specify a windowsize and threshold Instead of using the number of identities in a window as the window score, it calculates a more complex score based on the similarities of the bases/amino acids
  • 12. Problem • Make a dot-plot for amino acid sequences “RQQEPVRSTC” and “QQESGPVRST”, using a window size of 3, and a threshold of ≥3
  • 13. Answer • Make a dot-plot for sequences “RQQEPVRSTC” and “QQESGPVRST”, using window size: 3, threshold: ≥3 Q Q E S G P V R S T R Q Q E P V R S T C
  • 14. Further reading • Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn • Practical on dotplots in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html

Editor's Notes

  1. In R: setwd(&quot;C:/Documents and Settings/Avril Coughlan/My Documents/BACKEDUP/MScCourseLectures/MB6301Lectures/MB6301_Ls3456_Aln&quot;) library(&quot;seqinr&quot;) seq1 &lt;- “RQQEPVRSTC” seq2 &lt;- “QQESGPVRST” seq1b &lt;- s2c(seq1) seq2b &lt;- s2c(seq2) source(“dotplot.R”) makeDotPlot1(seq1b,seq2b,dotsize=1)
  2. In R: setwd(&quot;C:/Documents and Settings/Avril Coughlan/My Documents/BACKEDUP/MScCourseLectures/MB6301Lectures/MB6301_Ls3456_Aln&quot;) library(&quot;seqinr&quot;) seq1 &lt;- “GCATCGGC” seq2 &lt;- “CCATCGCCATCG” seq1b &lt;- s2c(seq1) seq2b &lt;- s2c(seq2) source(“dotplot.R”) makeDotPlot1(seq1b,seq2b,dotsize=1)
  3. In R: setwd(&quot;C:/Documents and Settings/Avril Coughlan/My Documents/BACKEDUP/MScCourseLectures/MB6301Lectures/MB6301_Ls3456_Aln&quot;) library(&quot;seqinr&quot;) seq1 &lt;- “GCATCGGC” seq2 &lt;- “CCATCGCCATCG” seq1b &lt;- s2c(seq1) seq2b &lt;- s2c(seq2) source(“dotplot.R”) makeDotPlot2(seq1b,seq2b,dotsize=1,windowsize=3,threshold=2)
  4. setwd(&quot;C:/Documents and Settings/Avril Coughlan/My Documents/BACKEDUP/MScCourseLectures/MB6301Lectures/MB6301_Ls3456_Aln&quot;) library(&quot;seqinr&quot;) seq1 &lt;- read.fasta(“human.fa”) # human Eyeless seq2 &lt;- read.fasta(“fly.fa”) # fruitfly Eyeless seq1b &lt;- seq1[[1]] seq2b &lt;- seq2[[1]] source(“dotplot.R”) makeDotPlot2(seq1b,seq2b,dotsize=1,windowsize=10,threshold=3) Saved picture as dotplot2.png
  5. setwd(&quot;C:/Documents and Settings/Avril Coughlan/My Documents/BACKEDUP/MScCourseLectures/MB6301Lectures/MB6301_Ls3456_Aln&quot;) library(&quot;seqinr&quot;) seq1 &lt;- read.fasta(“human.fa”) # human Eyeless seq2 &lt;- read.fasta(“fly.fa”) # fruitfly Eyeless seq1b &lt;- seq1[[1]] seq2b &lt;- seq2[[1]] source(“dotplot.R”) makeDotPlot2(seq1b,seq2b,dotsize=1,windowsize=10,threshold=5) Saved picture as dotplot1.png
  6. In R: setwd(&quot;C:/Documents and Settings/Avril Coughlan/My Documents/BACKEDUP/MScCourseLectures/MB6301Lectures/MB6301_Ls3456_Aln&quot;) library(&quot;seqinr&quot;) seq1 &lt;- &quot;RQQEPVRSTC&quot; seq2 &lt;- &quot;QQESGPVRST&quot; seq1b &lt;- s2c(seq1) seq2b &lt;- s2c(seq2) source(&quot;dotplot.R&quot;) makeDotPlot2(seq1b,seq2b,dotsize=1,windowsize=3,threshold=3)