SlideShare una empresa de Scribd logo
1 de 19
     
 pig.sh
120816
   Abstract
   Construction
   Implementation
   Reference
   Alias: position tree, PAT tree
   Important people
    o Weiner (1973)    first introduction
    o McCreight (1976) simplified the construction
    o Ukkonen (1995) fastest construction algorithm
    o Farach (1997)    optimal construction algorithm for all alphabets
   Trie
   string: S, length: N
   Suffix tree of S:
    o the paths from the root to the leaves have a one-to-one relationship
        with the suffixes of S.
    o edges spell non-empty strings.
    o all internal nodes (except perhaps the root) have at least two
        children
    -- reference. Wikipedia. Suffix tree
   String S = {peeper$}; Suffix(S,0) = {peeper$}
          ROOT
     p

     e

      e

     p

     e

      r
          peeper

            $
   String S = {peeper$}; Suffix(S,1) = {eeper$}
          ROOT
     p                 e

     e                       e

      e                      p

     p                       e

     e                       r
                                 eeper
      r
          peeper                  $

            $
   String S = {peeper$}; Suffix(S,2) = {eper$}
          ROOT
     p                 e

     e                       e           p

      e                      p           e

     p                       e           r
                                             eper
     e                       r
                                 eeper        $
      r
          peeper                  $

            $
   String S = {peeper$}; Suffix(S,3) = {per$}
          ROOT
     p                     e

     e                         e           p

      e            r           p           e
                       per
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   String S = {peeper$}; Suffix(S,4) = {er$}
          ROOT
     p                     e

     e                         e           p          r
                                                          er
      e            r           p           e
                       per                                $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   String S = {peeper$}; Suffix(S,5) = {r$}
          ROOT
                                                          r
     p                     e
                                                                   r
     e                         e           p          r
                                                              er   $
      e            r           p           e
                       per                                    $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   However, this isn’t a suffix tree. It’s a suffix trie.
          ROOT
                                                           r
      p                     e
                                                                    r
      e                         e           p          r
                                                               er   $
      e            r            p           e
                       per                                     $
      p                         e           r
                        $                       eper
      e                         r
                                    eeper        $
      r
          peeper                     $

            $
   Suffix trie can be compressed to suffix tree.
          ROOT
                                                          r
     p                     e
                                                                   r
     e                         e           p          r
                                                              er   $
      e            r           p           e
                       per                                    $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   The suffix tree of {peeper$} is completed.
           ROOT
                                                                r
     pe                     e
                                                                         r
    eper            r           eper           per          r
           peeper       per            eeper         eper           er   $

             $          $               $                           $
                                                      $
   There are many ways to implement suffix tree.
    o Sibling lists / unsorted arrays
    o Hash maps
    o Balanced search tree
    o Sorted array
    o Hash maps + sibling lists
Lookup   Insertion   Traversal
 Sibling lists /
unsorted arrays
  Hash maps
Balanced search
      tree
 Sorted arrays
 Hash maps +
  sibling lists
   How to implement the suffix tree/trie – child && sibling
        ROOT

         -85                    0                              72

          0                     0          -85         72

          0          72         -85         0

         -85                    0          72

          0                     72

         72
   struct node{
      struct node *child, *sibling;
      int c_num, s_num;
      int slope;
      int node_type;
      char *obslist_file;
    }
   node_type is used to indicate what the node is.
    (root / inter-node / leaf / terminal)
   obslist_file is used for external memory.
    The data that seldom queried will be recorded in this file.
   If the trie is too big, how can I do?
    o If trie is constructed by C-S-Link, every subtree is a binary tree.
    o Record the in-order and pre-/post- order sequence.
    o Use two sequence to reconstruct, if we want to query the subtree.
   Wikipedia – suffix tree
    http://en.wikipedia.org/wiki/Suffix_tree
   Data Structures, Algorithms, & Applications in Java Suffix Trees
    Copyright 1999 Sartaj Sahni
    http://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm#tree
   Websites for suffix tree/trie
     o   http://blog.csdn.net/ljsspace/article/details/6581850
     o   http://www.allisons.org/ll/AlgDS/Tree/Suffix/
     o   http://blog.csdn.net/TsengYuen/article/details/4815921
     o   http://www.cppblog.com/yuyang7/archive/2009/03/29/78252.html

Más contenido relacionado

La actualidad más candente

Pattern matching
Pattern matchingPattern matching
Pattern matchingshravs_188
 
RABIN KARP ALGORITHM STRING MATCHING
RABIN KARP ALGORITHM STRING MATCHINGRABIN KARP ALGORITHM STRING MATCHING
RABIN KARP ALGORITHM STRING MATCHINGAbhishek Singh
 
Compiler design syntax analysis
Compiler design syntax analysisCompiler design syntax analysis
Compiler design syntax analysisRicha Sharma
 
Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm KristinaBorooah
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammarmeresie tesfay
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentationAlizay Khan
 
Max flow min cut
Max flow min cutMax flow min cut
Max flow min cutMayank Garg
 
Data structure tries
Data structure triesData structure tries
Data structure triesMd. Naim khan
 
Theory of automata and formal language lab manual
Theory of automata and formal language lab manualTheory of automata and formal language lab manual
Theory of automata and formal language lab manualNitesh Dubey
 
Heap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap AlgorithmHeap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap AlgorithmLearning Courses Online
 

La actualidad más candente (20)

Pattern matching
Pattern matchingPattern matching
Pattern matching
 
RABIN KARP ALGORITHM STRING MATCHING
RABIN KARP ALGORITHM STRING MATCHINGRABIN KARP ALGORITHM STRING MATCHING
RABIN KARP ALGORITHM STRING MATCHING
 
Recursion
RecursionRecursion
Recursion
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Compiler design syntax analysis
Compiler design syntax analysisCompiler design syntax analysis
Compiler design syntax analysis
 
Shortest path algorithms
Shortest path algorithmsShortest path algorithms
Shortest path algorithms
 
Kmp
KmpKmp
Kmp
 
Types of algorithms
Types of algorithmsTypes of algorithms
Types of algorithms
 
Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm Arrays in Data Structure and Algorithm
Arrays in Data Structure and Algorithm
 
Hash table
Hash tableHash table
Hash table
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
 
Max flow min cut
Max flow min cutMax flow min cut
Max flow min cut
 
Data structure tries
Data structure triesData structure tries
Data structure tries
 
Theory of automata and formal language lab manual
Theory of automata and formal language lab manualTheory of automata and formal language lab manual
Theory of automata and formal language lab manual
 
4. avl
4. avl4. avl
4. avl
 
Heaps
HeapsHeaps
Heaps
 
Heap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap AlgorithmHeap Sort || Heapify Method || Build Max Heap Algorithm
Heap Sort || Heapify Method || Build Max Heap Algorithm
 

Destacado (12)

Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46
 
Trie tree
Trie treeTrie tree
Trie tree
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Lec18
Lec18Lec18
Lec18
 
Application of tries
Application of triesApplication of tries
Application of tries
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
 
Fundamentals
FundamentalsFundamentals
Fundamentals
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
 
Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Cis82 e2-1-packet forwarding
Cis82 e2-1-packet forwardingCis82 e2-1-packet forwarding
Cis82 e2-1-packet forwarding
 

Último

الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfOverkill Security
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 

Último (20)

الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 

Introduction of suffix tree

  • 1.  pig.sh 120816
  • 2. Abstract  Construction  Implementation  Reference
  • 3. Alias: position tree, PAT tree  Important people o Weiner (1973) first introduction o McCreight (1976) simplified the construction o Ukkonen (1995) fastest construction algorithm o Farach (1997) optimal construction algorithm for all alphabets
  • 4. Trie  string: S, length: N  Suffix tree of S: o the paths from the root to the leaves have a one-to-one relationship with the suffixes of S. o edges spell non-empty strings. o all internal nodes (except perhaps the root) have at least two children -- reference. Wikipedia. Suffix tree
  • 5. String S = {peeper$}; Suffix(S,0) = {peeper$} ROOT p e e p e r peeper $
  • 6. String S = {peeper$}; Suffix(S,1) = {eeper$} ROOT p e e e e p p e e r eeper r peeper $ $
  • 7. String S = {peeper$}; Suffix(S,2) = {eper$} ROOT p e e e p e p e p e r eper e r eeper $ r peeper $ $
  • 8. String S = {peeper$}; Suffix(S,3) = {per$} ROOT p e e e p e r p e per p e r $ eper e r eeper $ r peeper $ $
  • 9. String S = {peeper$}; Suffix(S,4) = {er$} ROOT p e e e p r er e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 10. String S = {peeper$}; Suffix(S,5) = {r$} ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 11. However, this isn’t a suffix tree. It’s a suffix trie. ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 12. Suffix trie can be compressed to suffix tree. ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 13. The suffix tree of {peeper$} is completed. ROOT r pe e r eper r eper per r peeper per eeper eper er $ $ $ $ $ $
  • 14. There are many ways to implement suffix tree. o Sibling lists / unsorted arrays o Hash maps o Balanced search tree o Sorted array o Hash maps + sibling lists
  • 15. Lookup Insertion Traversal Sibling lists / unsorted arrays Hash maps Balanced search tree Sorted arrays Hash maps + sibling lists
  • 16. How to implement the suffix tree/trie – child && sibling ROOT -85 0 72 0 0 -85 72 0 72 -85 0 -85 0 72 0 72 72
  • 17. struct node{ struct node *child, *sibling; int c_num, s_num; int slope; int node_type; char *obslist_file; }  node_type is used to indicate what the node is. (root / inter-node / leaf / terminal)  obslist_file is used for external memory. The data that seldom queried will be recorded in this file.
  • 18. If the trie is too big, how can I do? o If trie is constructed by C-S-Link, every subtree is a binary tree. o Record the in-order and pre-/post- order sequence. o Use two sequence to reconstruct, if we want to query the subtree.
  • 19. Wikipedia – suffix tree http://en.wikipedia.org/wiki/Suffix_tree  Data Structures, Algorithms, & Applications in Java Suffix Trees Copyright 1999 Sartaj Sahni http://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm#tree  Websites for suffix tree/trie o http://blog.csdn.net/ljsspace/article/details/6581850 o http://www.allisons.org/ll/AlgDS/Tree/Suffix/ o http://blog.csdn.net/TsengYuen/article/details/4815921 o http://www.cppblog.com/yuyang7/archive/2009/03/29/78252.html