SlideShare una empresa de Scribd logo
1 de 19
     
 pig.sh
120816
   Abstract
   Construction
   Implementation
   Reference
   Alias: position tree, PAT tree
   Important people
    o Weiner (1973)    first introduction
    o McCreight (1976) simplified the construction
    o Ukkonen (1995) fastest construction algorithm
    o Farach (1997)    optimal construction algorithm for all alphabets
   Trie
   string: S, length: N
   Suffix tree of S:
    o the paths from the root to the leaves have a one-to-one relationship
        with the suffixes of S.
    o edges spell non-empty strings.
    o all internal nodes (except perhaps the root) have at least two
        children
    -- reference. Wikipedia. Suffix tree
   String S = {peeper$}; Suffix(S,0) = {peeper$}
          ROOT
     p

     e

      e

     p

     e

      r
          peeper

            $
   String S = {peeper$}; Suffix(S,1) = {eeper$}
          ROOT
     p                 e

     e                       e

      e                      p

     p                       e

     e                       r
                                 eeper
      r
          peeper                  $

            $
   String S = {peeper$}; Suffix(S,2) = {eper$}
          ROOT
     p                 e

     e                       e           p

      e                      p           e

     p                       e           r
                                             eper
     e                       r
                                 eeper        $
      r
          peeper                  $

            $
   String S = {peeper$}; Suffix(S,3) = {per$}
          ROOT
     p                     e

     e                         e           p

      e            r           p           e
                       per
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   String S = {peeper$}; Suffix(S,4) = {er$}
          ROOT
     p                     e

     e                         e           p          r
                                                          er
      e            r           p           e
                       per                                $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   String S = {peeper$}; Suffix(S,5) = {r$}
          ROOT
                                                          r
     p                     e
                                                                   r
     e                         e           p          r
                                                              er   $
      e            r           p           e
                       per                                    $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   However, this isn’t a suffix tree. It’s a suffix trie.
          ROOT
                                                           r
      p                     e
                                                                    r
      e                         e           p          r
                                                               er   $
      e            r            p           e
                       per                                     $
      p                         e           r
                        $                       eper
      e                         r
                                    eeper        $
      r
          peeper                     $

            $
   Suffix trie can be compressed to suffix tree.
          ROOT
                                                          r
     p                     e
                                                                   r
     e                         e           p          r
                                                              er   $
      e            r           p           e
                       per                                    $
     p                         e           r
                       $                       eper
     e                         r
                                   eeper        $
      r
          peeper                    $

            $
   The suffix tree of {peeper$} is completed.
           ROOT
                                                                r
     pe                     e
                                                                         r
    eper            r           eper           per          r
           peeper       per            eeper         eper           er   $

             $          $               $                           $
                                                      $
   There are many ways to implement suffix tree.
    o Sibling lists / unsorted arrays
    o Hash maps
    o Balanced search tree
    o Sorted array
    o Hash maps + sibling lists
Lookup   Insertion   Traversal
 Sibling lists /
unsorted arrays
  Hash maps
Balanced search
      tree
 Sorted arrays
 Hash maps +
  sibling lists
   How to implement the suffix tree/trie – child && sibling
        ROOT

         -85                    0                              72

          0                     0          -85         72

          0          72         -85         0

         -85                    0          72

          0                     72

         72
   struct node{
      struct node *child, *sibling;
      int c_num, s_num;
      int slope;
      int node_type;
      char *obslist_file;
    }
   node_type is used to indicate what the node is.
    (root / inter-node / leaf / terminal)
   obslist_file is used for external memory.
    The data that seldom queried will be recorded in this file.
   If the trie is too big, how can I do?
    o If trie is constructed by C-S-Link, every subtree is a binary tree.
    o Record the in-order and pre-/post- order sequence.
    o Use two sequence to reconstruct, if we want to query the subtree.
   Wikipedia – suffix tree
    http://en.wikipedia.org/wiki/Suffix_tree
   Data Structures, Algorithms, & Applications in Java Suffix Trees
    Copyright 1999 Sartaj Sahni
    http://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm#tree
   Websites for suffix tree/trie
     o   http://blog.csdn.net/ljsspace/article/details/6581850
     o   http://www.allisons.org/ll/AlgDS/Tree/Suffix/
     o   http://blog.csdn.net/TsengYuen/article/details/4815921
     o   http://www.cppblog.com/yuyang7/archive/2009/03/29/78252.html

Más contenido relacionado

La actualidad más candente

Pattern matching
Pattern matchingPattern matching
Pattern matching
shravs_188
 

La actualidad más candente (20)

Pattern matching
Pattern matchingPattern matching
Pattern matching
 
Bruteforce algorithm
Bruteforce algorithmBruteforce algorithm
Bruteforce algorithm
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithm
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Sequential & binary, linear search
Sequential & binary, linear searchSequential & binary, linear search
Sequential & binary, linear search
 
Algorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IAlgorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms I
 
Chap4
Chap4Chap4
Chap4
 
Compiler Design Unit 2
Compiler Design Unit 2Compiler Design Unit 2
Compiler Design Unit 2
 
Algorithms Lecture 5: Sorting Algorithms II
Algorithms Lecture 5: Sorting Algorithms IIAlgorithms Lecture 5: Sorting Algorithms II
Algorithms Lecture 5: Sorting Algorithms II
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
 
Operator Precedence Grammar
Operator Precedence GrammarOperator Precedence Grammar
Operator Precedence Grammar
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
 
Red black tree
Red black treeRed black tree
Red black tree
 
Hash table
Hash tableHash table
Hash table
 
Back patching
Back patchingBack patching
Back patching
 
Pandas
PandasPandas
Pandas
 
Analysis of algorithm
Analysis of algorithmAnalysis of algorithm
Analysis of algorithm
 
Brute force-algorithm
Brute force-algorithmBrute force-algorithm
Brute force-algorithm
 
Advanced Sorting Algorithms
Advanced Sorting AlgorithmsAdvanced Sorting Algorithms
Advanced Sorting Algorithms
 

Destacado (14)

Packet forwarding in wan.46
Packet  forwarding in wan.46Packet  forwarding in wan.46
Packet forwarding in wan.46
 
Trie tree
Trie treeTrie tree
Trie tree
 
Data structure tries
Data structure triesData structure tries
Data structure tries
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Lec18
Lec18Lec18
Lec18
 
TRIES_data_structure
TRIES_data_structureTRIES_data_structure
TRIES_data_structure
 
Application of tries
Application of triesApplication of tries
Application of tries
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
 
Fundamentals
FundamentalsFundamentals
Fundamentals
 
Tries - Tree Based Structures for Strings
Tries - Tree Based Structures for StringsTries - Tree Based Structures for Strings
Tries - Tree Based Structures for Strings
 
Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2Basic Packet Forwarding in NS2
Basic Packet Forwarding in NS2
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Cis82 e2-1-packet forwarding
Cis82 e2-1-packet forwardingCis82 e2-1-packet forwarding
Cis82 e2-1-packet forwarding
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Introduction of suffix tree

  • 1.  pig.sh 120816
  • 2. Abstract  Construction  Implementation  Reference
  • 3. Alias: position tree, PAT tree  Important people o Weiner (1973) first introduction o McCreight (1976) simplified the construction o Ukkonen (1995) fastest construction algorithm o Farach (1997) optimal construction algorithm for all alphabets
  • 4. Trie  string: S, length: N  Suffix tree of S: o the paths from the root to the leaves have a one-to-one relationship with the suffixes of S. o edges spell non-empty strings. o all internal nodes (except perhaps the root) have at least two children -- reference. Wikipedia. Suffix tree
  • 5. String S = {peeper$}; Suffix(S,0) = {peeper$} ROOT p e e p e r peeper $
  • 6. String S = {peeper$}; Suffix(S,1) = {eeper$} ROOT p e e e e p p e e r eeper r peeper $ $
  • 7. String S = {peeper$}; Suffix(S,2) = {eper$} ROOT p e e e p e p e p e r eper e r eeper $ r peeper $ $
  • 8. String S = {peeper$}; Suffix(S,3) = {per$} ROOT p e e e p e r p e per p e r $ eper e r eeper $ r peeper $ $
  • 9. String S = {peeper$}; Suffix(S,4) = {er$} ROOT p e e e p r er e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 10. String S = {peeper$}; Suffix(S,5) = {r$} ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 11. However, this isn’t a suffix tree. It’s a suffix trie. ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 12. Suffix trie can be compressed to suffix tree. ROOT r p e r e e p r er $ e r p e per $ p e r $ eper e r eeper $ r peeper $ $
  • 13. The suffix tree of {peeper$} is completed. ROOT r pe e r eper r eper per r peeper per eeper eper er $ $ $ $ $ $
  • 14. There are many ways to implement suffix tree. o Sibling lists / unsorted arrays o Hash maps o Balanced search tree o Sorted array o Hash maps + sibling lists
  • 15. Lookup Insertion Traversal Sibling lists / unsorted arrays Hash maps Balanced search tree Sorted arrays Hash maps + sibling lists
  • 16. How to implement the suffix tree/trie – child && sibling ROOT -85 0 72 0 0 -85 72 0 72 -85 0 -85 0 72 0 72 72
  • 17. struct node{ struct node *child, *sibling; int c_num, s_num; int slope; int node_type; char *obslist_file; }  node_type is used to indicate what the node is. (root / inter-node / leaf / terminal)  obslist_file is used for external memory. The data that seldom queried will be recorded in this file.
  • 18. If the trie is too big, how can I do? o If trie is constructed by C-S-Link, every subtree is a binary tree. o Record the in-order and pre-/post- order sequence. o Use two sequence to reconstruct, if we want to query the subtree.
  • 19. Wikipedia – suffix tree http://en.wikipedia.org/wiki/Suffix_tree  Data Structures, Algorithms, & Applications in Java Suffix Trees Copyright 1999 Sartaj Sahni http://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm#tree  Websites for suffix tree/trie o http://blog.csdn.net/ljsspace/article/details/6581850 o http://www.allisons.org/ll/AlgDS/Tree/Suffix/ o http://blog.csdn.net/TsengYuen/article/details/4815921 o http://www.cppblog.com/yuyang7/archive/2009/03/29/78252.html