SlideShare a Scribd company logo
1 of 26
Pioneering
                                 Scientific Intelligence




DNA/Small RNA Alignment
in Avadis NGS 1.3

Strictly Confidential   © Strand Life Sciences
How does CoBWeb compare with other
 What is an Alignment algorithm?                  algorithms?

  What issues must an Alignment         How is CoBWeb exposed in Avadis
      algorithm consider?                           NGS?

                                         What is the future evolution of
How do Alignment algorithms work?                  CoBWeb?



    How does CoBWeb work?



        Questions we will seek to answer in this presentation




                                                       © Strand
What is an Alignment algorithm?




                            © Strand
Subject’s
                                          Genome
AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC




AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC
                                          Reference
                                        Genome, close
                                         but not quite
                                       the same as the
                                           Subject’s
                                           Genome



                                    © Strand
What issues must an Alignment
    algorithm consider?




                           © Strand
Mismatches and
     Gaps
                                      Reference
                                       Genome




Deletion




             Reads
                     SNP
                           © Strand
Handling paired
    reads
                                           Subject’s
                                           Genome




                                    ×

                                              Reference
                                               Genome
                  Repeat   Repeat
                  Region   Region




                                        © Strand
A variety of
Read Lengths

                Short reads
                 ~50, few
                mismatches
                 and gaps

                                               Long
                                            reads, few
                                           hundreds to
                                         thousands, ma
                                             ny more
                                           mismatches
                                             and gaps




                              © Strand
Speed and
 Memory




                   Run in 4GB
                     RAM          Allow use of
                                    multiple
     Billions of                 cores/process
      reads.                          ors
                   Scale speed
                    with more
                     memory




                                   © Strand
How do Alignment algorithms work?




                             © Strand
Indexing the
    Genome to find
    Seed Matches                                          Scanning the
                                                         Reference for
                                                           each Read
                                                         takes too long




                      The Reference
                          Index
                                                   The Index very
                                                    quickly yields
                                                   locations in the
                                                  Reference where
                                                 some part (seed) of
                                                 the Read matches.
This Seed occurs at        This Seed occurs at
Reference locations        Reference locations
      x1, x2…                    x3, x4…


                                                   © Strand
Detailed
 Alignment at
 Seed Match
  Locations


                                 Seed
Reference                        Match




                                            Read




        How many Mismatches
        and Gaps are needed
         for the Read to match
           around the Seed?
          Smith-Waterman or
        Dynamic Programming




                                 © Strand
The Burrows-
Wheeler based
   Index

                          The original
                          Reference
                                             C    G      A      C    $
       All its circular
       shifts, sorted                        A    C      $      C    G              This column is
                                         2                                            the BWT
     lexicographically
                                         0   C    G      A      C    $
                                         3   C    $      C      G    A
                                         1   G    A      C      $    C
  Circular Shift
     Indices                             4   $    C      G      A    C



                                                     The Index
   These can be sampled                           comprises these
     to fit into reduced                          along with some
   memory at the expense                         housekeeping data
      of speed without                               structures
   sacrificing correctness


                                                                         © Strand
The Burrows-
Wheeler based
   Index




                                            EXACT
      Reference                             Match




                                                    Read




        All Exact Matches of a Read (NO
           Mismatches or Gaps) in the
        Reference can be found in time
        proportional to the length of the
        Read and largely independent of
            the size of the Reference.




                                             © Strand
How does CoBWeb work?




                        © Strand
Seeding
Strategy




     This 15-mer occurs   This 15-mer occurs
         at locations         at locations
           x1, x2…              x3, x4…              This whole 30-mer
                                                     occurs at location
                                                            x5
   Use the BW based
   index, augmented
  with additional data
     structures for
  speed, to find one or
    more Long Seed
     Matches in the
       Reference
                               Justification: Most long
                                  Reads do not have
                               Mismatches and Gaps
                             strewn across their length;            And Long Seeds
                                there are usually long               will have few
                                 stretches that match              matching locations.
                                        exactly.
                                                            © Strand
Advantages




                                   Separating the Smith-
          Seed length is not       Waterman phase from
        specified in advance, so   the BW Index search
       Long and Short reads can     allows an unlimited
        be handled seamlessly.      number of gaps and
                                        mismatches.




                                                     © Strand
How does CoBWeb compare with other
            algorithms?




                             © Strand
Comparison
 with BWA                    CoBWeb:
                                94%                BWA: 4%
                             Alignment           error + 1 gap
                  Read      Score with up         of possibly
                Length 50    to 2 Gaps           multiple length




               Read
             Length 150




                                             A little faster than
                                                  BWA with
                                            comparable results


                                                © Strand
How is CoBWeb exposed in Avadis
            NGS?




                            © Strand
Entry




             Two new experiment
            types, DNA Alignment
               and Small-RNA
                  Alignment




        © Strand
The Alignment
  Workflow




                Run Alignment, and then
                create a DNA Variant or
                 ChIP-Seq Experiment
                   from the results.




                          © Strand
Specify number of
 Alignment     Mismatches and
Parameters   Gaps, and handling of
              Multiple Matching.




                      Specify Adaptor
                  Trimming (only for Small
                  RNA) and 3’,5’ trimming
                      based on quality




                     Screen against
                 Contaminant Databases.




                © Strand
What is the future evolution of
          CoBWeb?




                             © Strand
ToDos




        Chimeric
         Reads
                          RNA-Seq
                          Alignment




                   Base Quality
                   recalibration


                                      Affine Gap
                                        Costs




                                                   © Strand
http://www.avadis-ngs.com




                      © Strand

More Related Content

More from Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Alignment of raw reads in Avadis NGS

  • 1. Pioneering Scientific Intelligence DNA/Small RNA Alignment in Avadis NGS 1.3 Strictly Confidential © Strand Life Sciences
  • 2. How does CoBWeb compare with other What is an Alignment algorithm? algorithms? What issues must an Alignment How is CoBWeb exposed in Avadis algorithm consider? NGS? What is the future evolution of How do Alignment algorithms work? CoBWeb? How does CoBWeb work? Questions we will seek to answer in this presentation © Strand
  • 3. What is an Alignment algorithm? © Strand
  • 4. Subject’s Genome AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome, close but not quite the same as the Subject’s Genome © Strand
  • 5. What issues must an Alignment algorithm consider? © Strand
  • 6. Mismatches and Gaps Reference Genome Deletion Reads SNP © Strand
  • 7. Handling paired reads Subject’s Genome × Reference Genome Repeat Repeat Region Region © Strand
  • 8. A variety of Read Lengths Short reads ~50, few mismatches and gaps Long reads, few hundreds to thousands, ma ny more mismatches and gaps © Strand
  • 9. Speed and Memory Run in 4GB RAM Allow use of multiple Billions of cores/process reads. ors Scale speed with more memory © Strand
  • 10. How do Alignment algorithms work? © Strand
  • 11. Indexing the Genome to find Seed Matches Scanning the Reference for each Read takes too long The Reference Index The Index very quickly yields locations in the Reference where some part (seed) of the Read matches. This Seed occurs at This Seed occurs at Reference locations Reference locations x1, x2… x3, x4… © Strand
  • 12. Detailed Alignment at Seed Match Locations Seed Reference Match Read How many Mismatches and Gaps are needed for the Read to match around the Seed? Smith-Waterman or Dynamic Programming © Strand
  • 13. The Burrows- Wheeler based Index The original Reference C G A C $ All its circular shifts, sorted A C $ C G This column is 2 the BWT lexicographically 0 C G A C $ 3 C $ C G A 1 G A C $ C Circular Shift Indices 4 $ C G A C The Index These can be sampled comprises these to fit into reduced along with some memory at the expense housekeeping data of speed without structures sacrificing correctness © Strand
  • 14. The Burrows- Wheeler based Index EXACT Reference Match Read All Exact Matches of a Read (NO Mismatches or Gaps) in the Reference can be found in time proportional to the length of the Read and largely independent of the size of the Reference. © Strand
  • 15. How does CoBWeb work? © Strand
  • 16. Seeding Strategy This 15-mer occurs This 15-mer occurs at locations at locations x1, x2… x3, x4… This whole 30-mer occurs at location x5 Use the BW based index, augmented with additional data structures for speed, to find one or more Long Seed Matches in the Reference Justification: Most long Reads do not have Mismatches and Gaps strewn across their length; And Long Seeds there are usually long will have few stretches that match matching locations. exactly. © Strand
  • 17. Advantages Separating the Smith- Seed length is not Waterman phase from specified in advance, so the BW Index search Long and Short reads can allows an unlimited be handled seamlessly. number of gaps and mismatches. © Strand
  • 18. How does CoBWeb compare with other algorithms? © Strand
  • 19. Comparison with BWA CoBWeb: 94% BWA: 4% Alignment error + 1 gap Read Score with up of possibly Length 50 to 2 Gaps multiple length Read Length 150 A little faster than BWA with comparable results © Strand
  • 20. How is CoBWeb exposed in Avadis NGS? © Strand
  • 21. Entry Two new experiment types, DNA Alignment and Small-RNA Alignment © Strand
  • 22. The Alignment Workflow Run Alignment, and then create a DNA Variant or ChIP-Seq Experiment from the results. © Strand
  • 23. Specify number of Alignment Mismatches and Parameters Gaps, and handling of Multiple Matching. Specify Adaptor Trimming (only for Small RNA) and 3’,5’ trimming based on quality Screen against Contaminant Databases. © Strand
  • 24. What is the future evolution of CoBWeb? © Strand
  • 25. ToDos Chimeric Reads RNA-Seq Alignment Base Quality recalibration Affine Gap Costs © Strand