SlideShare una empresa de Scribd logo
1 de 23
Suffix Arrays in Linear Time
Index text, so substring
queries can be answered fast
The Text

                                 C       G   A       C       G       C   T


Suffix Tree




        A                    C                           G                   T


                   G                 T           A               C


              A          C
The Text

                       C       G   A       C       G       C      T




A                  C                           G                       T


         G                 T           A               C


    A          C

                                                               Substring
                               C   G   C                        Query
Trees take too much space.
Are there smaller indices?
The Text

                                               C           G       A       C       G       C   T


Suffix Tree




        A                              C                                       G                   T


                         G                         T                   A               C


                    A              C
    Suffix Array
   Sorted List of
     Suffixes                  3           1           4       6       2       5       7
The Text

                 C       G       A       C       G       C       T




                                 Burrows-Wheeler
                                  Index (an array)



  Suffix Array

                     3       1       4       6       2       5       7
How can one compute the
Suffix Array in Linear Time?
Task
String of length n
 with characters
in the range 1..n




          Sort these
           suffixes
      lexicographically




                    Obtain two arrays,          O(n log n)
                 f[i]: sorted order of ith     comparisons
                     suffix, g[i]: which      each taking up
                   suffix is ith highest         to n time
Divide and Conquer




Separate odd and
even suffixes; sort
 each recursively,
  then combine
Sorting Even Suffixes



                     A1 A2
                             A3 A4

  Sort these n/2
  pairs and map
  them to single
chars in the range
      1..n/2


                                 New text of half
                                 the length; sort
                                     suffixes
                                   recursively
Sorting Odd Suffixes


                        O1      O2      O3      O4

                       A1,E1   A2,E2   A3,E3   A4,E4



 Sort these n/2
pairs, E’s are the
 even suffixes,
whose order we
      know
Time Complexity


T(n) = O(n) + T(n/2) + Time for merging even and odd suffixes




O(n)
Merging


                          O     E

                          A,E   B,O


 Do we have any info
   to determine the
  relative order of an
odd suffix and an even
          one?
The Trick
                   Sanders, Karkkainnen




                      0      1      2


 Split suffixes
  into 3 groups
instead of 2, so
0 mod 3, 1 mod
 3 and 2 mod 3
Sorting 0 and 1 Together

                   ABCDEFGHIJKL


 Sort these 2n/3
triplets and map
 them to single
      chars


                      New text of
                   length 2n/3; sort
                        suffixes
                      recursively
Sorting Suffixes in 2


                         21     22      23     24

                       A1,01   A2,02   A3,03   A4,04



 Sort these n/3
pairs, 0’s are the
 mod 0 suffixes,
whose order we
      know
Merging


                     1      2

                    AB,0   CD,1



 We know the
order of all 0,1
   suffixes!
Time Complexity


  T(n) = O(n) + T(2n/3) + O(n)




  O(n)
Generalization
Set D of indices mod v


                           v                     2v         3v




                                                            Sorting suffixes of
                                                           this string gives the
    This string has size         Time taken to create       sorted order of all
           |D|n/v                this string is O(n |D|)   suffixes which begin
                                                           at indices j such that
                                                               j mod v is in D
Key Property of D



                        x<v
                                        x<v

For any 2 indices i and j
            i-j mod v is the distance between some two beads in D



                          D is a Difference Cover if
                         distances between beads in
                             D generate 0,1…,v-1
Size of D
                                       sqrt(v)




sqrt(v)




          There exists a Difference
          Cover of size 1.5*sqrt(v)!
Time Complexity

 T(n) = O(n|D|) + T(|D|n/v) + O(nv)

  T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv)




     For |D|=2.5 sqrt(v)

Más contenido relacionado

La actualidad más candente

Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterAdila Krisnadhi
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaRushabh2428
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdfDilouar Hossain
 
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...parmeet834
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefixSanjeev Gupta
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Math63032modal
Math63032modalMath63032modal
Math63032modalHanibei
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Db31463471
Db31463471Db31463471
Db31463471IJMER
 
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...inventionjournals
 
AlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm 2012
 

La actualidad más candente (18)

Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 Poster
 
AI Lesson 15
AI Lesson 15AI Lesson 15
AI Lesson 15
 
AI Lesson 14
AI Lesson 14AI Lesson 14
AI Lesson 14
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdf
 
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefix
 
Unit i
Unit iUnit i
Unit i
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Math63032modal
Math63032modalMath63032modal
Math63032modal
 
Biconnectivity
BiconnectivityBiconnectivity
Biconnectivity
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Db31463471
Db31463471Db31463471
Db31463471
 
Unit ii
Unit iiUnit ii
Unit ii
 
Theory of computation Lec3 dfa
Theory of computation Lec3 dfaTheory of computation Lec3 dfa
Theory of computation Lec3 dfa
 
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
 
Mcs 031
Mcs 031Mcs 031
Mcs 031
 
AlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier Goaoc
 

Destacado

Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesBenjamin Sach
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common AncestorBenjamin Sach
 
Ukk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeUkk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeJiachen Yang
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAmrith Krishna
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonDavide Eynard
 

Destacado (7)

Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
 
Ukk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeUkk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix Tree
 
Ch09 combinatorialpatternmatching
Ch09 combinatorialpatternmatchingCh09 combinatorialpatternmatching
Ch09 combinatorialpatternmatching
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problem
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparison
 

Similar a Suffix arrays

Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningBag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningTomonari Masada
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCSR2011
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxjainaaru59
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationAttaporn Ninsuwan
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-posterFeynman Liang
 

Similar a Suffix arrays (18)

Linear sorting
Linear sortingLinear sorting
Linear sorting
 
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
 
Algorithm Exam Help
Algorithm Exam HelpAlgorithm Exam Help
Algorithm Exam Help
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningBag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
PHP Cheatsheet
PHP CheatsheetPHP Cheatsheet
PHP Cheatsheet
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskin
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
ALG5.1.ppt
ALG5.1.pptALG5.1.ppt
ALG5.1.ppt
 
AJMS_476_23.pdf
AJMS_476_23.pdfAJMS_476_23.pdf
AJMS_476_23.pdf
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-poster
 

Más de Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Último

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Suffix arrays

  • 1. Suffix Arrays in Linear Time
  • 2. Index text, so substring queries can be answered fast
  • 3. The Text C G A C G C T Suffix Tree A C G T G T A C A C
  • 4. The Text C G A C G C T A C G T G T A C A C Substring C G C Query
  • 5. Trees take too much space. Are there smaller indices?
  • 6. The Text C G A C G C T Suffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
  • 7. The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
  • 8. How can one compute the Suffix Array in Linear Time?
  • 9. Task String of length n with characters in the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
  • 10. Divide and Conquer Separate odd and even suffixes; sort each recursively, then combine
  • 11. Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to single chars in the range 1..n/2 New text of half the length; sort suffixes recursively
  • 12. Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2 pairs, E’s are the even suffixes, whose order we know
  • 13. Time Complexity T(n) = O(n) + T(n/2) + Time for merging even and odd suffixes O(n)
  • 14. Merging O E A,E B,O Do we have any info to determine the relative order of an odd suffix and an even one?
  • 15. The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groups instead of 2, so 0 mod 3, 1 mod 3 and 2 mod 3
  • 16. Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3 triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
  • 17. Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3 pairs, 0’s are the mod 0 suffixes, whose order we know
  • 18. Merging 1 2 AB,0 CD,1 We know the order of all 0,1 suffixes!
  • 19. Time Complexity T(n) = O(n) + T(2n/3) + O(n) O(n)
  • 20. Generalization Set D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
  • 21. Key Property of D x<v x<v For any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
  • 22. Size of D sqrt(v) sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
  • 23. Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)