SlideShare una empresa de Scribd logo
1 de 42
MaxPlanckInstitute@Tubingen, Feb. 25th, 2010   Mining Frequent Subgraphs from Linear Graphs YasuoTabei Computational Biology Research Center, AIST Joint work with Daisuke Okanohara (Univ. of Tokyo),  Shuichi Hirose (AIST),  Koji Tsuda (AIST)
Outline ,[object Object],- The needs for frequent subgraph mining algorithm  - What is a linear graph? ,[object Object], - Subgraphenumeration algorithm from a linear graph  - Extension to frequent subgraphmining algorithm ,[object Object],- Motifs extraction from protein 3D-structures in molecularbiology  - Phrase extraction from predicate-argument structures in NLP ,[object Object],[object Object]
Frequent Subgraph Mining  Enumerate all frequent subgraphs in a graph database Input: graph database G={g1,g2,…,gN} G1 G2 G3 Output: frequent subgraphs appearing in at least m graphs
gSpan algorithm (Yan et al., 2002) Rightmost pattern extension  Duplication can happen Minimum DFS code checking Time exponential to pattern size  g
Linear Graph (Davydov et al., 2004) c b a a Labeled graph whose vertices are totally ordered Linear graph g=(V,E,LV,LE) - V⊂N: ordered vertex set    - E⊆V×V: edge set    - LV: V->ΣV: vertex labeling    - LE:E->ΣE: edge labeling     Ex) RNA, protein,                                                        alternative                                                                    splicing forms,                                                       PAS 1 2 3 4 5 6 A B A B C A
Linear Graph (Davydov et al., 2004) c b Labeled graph whose vertices are totally ordered Many types of data can be represented as linear graphs ,[object Object]
Alternative splicing forms
RNA secondary structures
Predicate-argument structurea a 1 2 3 4 5 6 A B A B C A
Linear Subgraph Relation  g1 is a linear subgraph of g2 ⇔ i)The ordinary subgraph condition      - the vertex labels are matched   - all edges of g1 also exit in g2 with the correctlabels       ii) The order of vertices are conserved   Ex) ⊂ 1 3 2 6 4 5 3 2 1 C G A C G A T A T g1 g2
Example of Not Linear Subgraph g1  is not a linear subgraph of g2  - vertex labels are matched    - all edges of g1 also exit in g2 with the correct labels - the order of vertices is not conserved             Ex) b b c × ⊂ c 1 2 1 3 2 3 g1 g2 A A A B B A
Total order among edges in a linear graph Compare the left nodes first. If they are identical, look at the right nodes ∀e1=(i,j),e2=(k,l)∈Eg, e1<ee2     if and only if (i)i<k or (ii)i=k, j < l                                                 Ex) 2 3 1 1 2 3 4 e2 e1 i j k l
Disconnected Patterns Linear Graph: Sequence + Graph In sequence mining, gapped patterns are considered Need to mine disconnected patterns as well  Data represented as disconnected patterns ,[object Object]
RNA secondary structure
Alternative splicing4 1 2 3 4 D A R N D
Outline ,[object Object],- The needs for frequent subgraph mining algorithm  - Linear Graph ,[object Object], - Subgraph mining algorithm from a linear graph  - Frequent subgraphmining algorithm from linear graphs  ,[object Object],- Motifs extraction from protein 3D-structures in molecularbiology  - Phrase extraction from predicate-argument structures in NLP ,[object Object],[object Object]
Enumeration of All Linear Subgraphs of a Linear Graph Before considering a mining algorithm, we have to solve the problem of subgraph enumeration first How to enumerate all subgraphs of the following linear graph without duplication?
Search Lattice of All Subgraphs # of edges (level) 1 empty 2 3 4
Reverse Search (Avis and Fukuda, 1993) All subgraphs can be enumerated by traversing the search lattice  - To prevent duplication is difficult Need to define a search tree in the search lattice Reduction map f Mapping from a child to its parent Remove the largest edge  f 2 2 3 1 1 1 2 3 4 1 2 3
Search Tree induced by the reduction map By applying the reduction map to each element search tree can be induced empty
Inverting the reduction map f-1 In traversing the tree from root, children nodes are created on demand Consider all children candidate Take the ones that qualify the reduction map ,[object Object],However,  in this particular case, the reduction map can be inverted explicitly Can derive the pattern extension rule (from parent to children)
Pattern Extension Rule 0-vertex addition (A-1) Parent Graph the largest edge new added edge (B-2) (B-3) (B-4) i i 1-vertex addition i i i (B-1) i (B-5) (B-6) (B-7) i i i 2-verteces addition (C-2) (C-3) (C-1) i i i (C-4) (C-5) (C-6) i i i
Traversing search tree from root Depth first traversal for its memory efficiency the largest edge new added edge empty
Frequent Subgraph Mining Basic idea: find all possible extensions of a current pattern in the graph database, and      extend the pattern. Occurrence list LG(g)  - Record every occurrence of a pattern g in the graph database G - Calculatethe support of a pattern g by the occurrence list. ,[object Object],    for pruning Search Tree pruning
Outline ,[object Object],- The need for frequent subgraph mining algorithm  - Linear Graph ,[object Object], - Subgraph mining algorithm from a linear graph  - Frequent subgraphmining algorithm from linear graphs  ,[object Object],- Motifs extraction from protein 3D-structures in molecularbiology  - Phrase extraction from predicate-argument structures in NLP ,[object Object],[object Object]
Motif extraction from protein 3D structures Pairs of homologous proteins in thermophilic organism and methophilic organism  Construct a linear graph from a protein  ,[object Object], - Assign vertex labels from {1,…,6} according to its property (Mirny, 1999).  ,[object Object]
  No edge labels.Rank the patterns by statistical significance (p-values) Association to thermophilic/methophilic label Fisher exact test
Applying gSpan Want to compare the execution time of our algorithm with that of gSpan gSpan is not directly applicable   - Contact maps are not always connected   - Made 1-gap and 2-gap linear graphs
Runtime comparison ,[object Object]
Execution time of LGM is reasonable.gSpan does not work on the 2-gap linear graph dataset even if the minimum support threshold is 50.
Minimum support = 10 103 patterns whose p-value < 0.001 Thermophilic (TATA), Mesophilic (pol II)
Mapping motifs in 3D structure
Phrase extraction from predicate-argument structures Internet movie review dataset (Pang et al., 2004) Sentiment dataset 5331 positive and  5331 negative opinions  ,[object Object],5000 subjective and 5000 objective sentences ,[object Object]
Extract characteristic phrases (subgraph patterns),[object Object]
Methods in comparison PAS+gSpan Predicate argument structure + gSpan No edges added Dep+FREQT Dependency tree (KSDep) + FREQT (Tree Miner) ,[object Object],Modified PrefixSpan (Sequence Miner)
Classification Accuracy The accuracy of LGM is better than that of gSpan  PAS representation is comparable to the other representations.
Phrase structure extraction from predicate-argument structures Only simple sequential patterns are extracted ,[object Object],aaa
Phrase structure extraction from predicate-argument structures Phrase structures were extracted. ,[object Object],[object Object]
Another topics  Alignment algorithms for RNA sequences  - Ph.D. study All pairs similarity search method  - nearest neighbor graphs
Q & A
Data represented as linear graphs DNA, RNA, protein-3D structure, predicate argument structure  - reference point: 5-strand(DNA, RNA),  N-terminal (protein)  Ex)  RNA                         Protein (edge: 5Å) 1 2 3 4 1 2 3 4 3’ N C 5’ G U G C A R N D
Data NOT represented as linear graphs Chemical compounds, Gene co-expression networks, social networks etc  v1 v3 v2 v4 ,[object Object],-  4! manners v1 v2 v3 v4 v1 v2 v4 v3 …. v1 v2 v3 v4

Más contenido relacionado

La actualidad más candente

Example of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchExample of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchAbhijeet Agarwal
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms masterHossam Hassan
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic searchTianlu Wang
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentationAlizay Khan
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Searchmatele41
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)Bablu Shofi
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search Mustafa Jarrar
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesHema Kashyap
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesHema Kashyap
 
Functions
FunctionsFunctions
FunctionsGaditek
 
GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004Rohit Garg
 
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...Edureka!
 
20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04Computer Science Club
 
Pathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchPathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchStavros Vassos
 

La actualidad más candente (20)

AI Lesson 06
AI Lesson 06AI Lesson 06
AI Lesson 06
 
Example of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchExample of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional search
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms master
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic search
 
AI Lesson 05
AI Lesson 05AI Lesson 05
AI Lesson 05
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniques
 
A star algorithms
A star algorithmsA star algorithms
A star algorithms
 
DFS and BFS
DFS and BFSDFS and BFS
DFS and BFS
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic Searches
 
Functions
FunctionsFunctions
Functions
 
GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004
 
Lecture13
Lecture13Lecture13
Lecture13
 
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
 
20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
 
Pathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchPathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic search
 

Destacado

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesYasuo Tabei
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicYasuo Tabei
 
Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Yasuo Tabei
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabeiYasuo Tabei
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20Yasuo Tabei
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicYasuo Tabei
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceYasuo Tabei
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201Yasuo Tabei
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306Yasuo Tabei
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009Yasuo Tabei
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeYasuo Tabei
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesYasuo Tabei
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_searchComputer Science Club
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015Nicola Barbieri
 

Destacado (20)

Lp Boost
Lp BoostLp Boost
Lp Boost
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
 
Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabei
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
GIW2013
GIW2013GIW2013
GIW2013
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTree
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
 
Jayant lrs
Jayant lrsJayant lrs
Jayant lrs
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015
 

Similar a Lgm saarbrucken

A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesIJMER
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphsChirag Jain
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph IndexingKisung Kim
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblasMIT
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblasgraphulo
 
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircleFinding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCirclecharlingual
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in anexgentech15
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in aNexgen Technology
 
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...Nexgen Technology
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
Parallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsParallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsGanesan Narayanasamy
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishtuxette
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009Houw Liong The
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmCemal Ardil
 

Similar a Lgm saarbrucken (20)

A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph Databases
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph Indexing
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircleFinding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
 
Colombo14a
Colombo14aColombo14a
Colombo14a
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
 
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Parallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsParallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU Platforms
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
Mayank
MayankMayank
Mayank
 
graph_mining_seminar_2009.ppt
graph_mining_seminar_2009.pptgraph_mining_seminar_2009.ppt
graph_mining_seminar_2009.ppt
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithm
 

Lgm saarbrucken

  • 1. MaxPlanckInstitute@Tubingen, Feb. 25th, 2010 Mining Frequent Subgraphs from Linear Graphs YasuoTabei Computational Biology Research Center, AIST Joint work with Daisuke Okanohara (Univ. of Tokyo), Shuichi Hirose (AIST), Koji Tsuda (AIST)
  • 2.
  • 3. Frequent Subgraph Mining Enumerate all frequent subgraphs in a graph database Input: graph database G={g1,g2,…,gN} G1 G2 G3 Output: frequent subgraphs appearing in at least m graphs
  • 4. gSpan algorithm (Yan et al., 2002) Rightmost pattern extension Duplication can happen Minimum DFS code checking Time exponential to pattern size g
  • 5. Linear Graph (Davydov et al., 2004) c b a a Labeled graph whose vertices are totally ordered Linear graph g=(V,E,LV,LE) - V⊂N: ordered vertex set - E⊆V×V: edge set - LV: V->ΣV: vertex labeling - LE:E->ΣE: edge labeling Ex) RNA, protein, alternative splicing forms, PAS 1 2 3 4 5 6 A B A B C A
  • 6.
  • 9. Predicate-argument structurea a 1 2 3 4 5 6 A B A B C A
  • 10. Linear Subgraph Relation g1 is a linear subgraph of g2 ⇔ i)The ordinary subgraph condition - the vertex labels are matched   - all edges of g1 also exit in g2 with the correctlabels ii) The order of vertices are conserved Ex) ⊂ 1 3 2 6 4 5 3 2 1 C G A C G A T A T g1 g2
  • 11. Example of Not Linear Subgraph g1 is not a linear subgraph of g2 - vertex labels are matched - all edges of g1 also exit in g2 with the correct labels - the order of vertices is not conserved Ex) b b c × ⊂ c 1 2 1 3 2 3 g1 g2 A A A B B A
  • 12. Total order among edges in a linear graph Compare the left nodes first. If they are identical, look at the right nodes ∀e1=(i,j),e2=(k,l)∈Eg, e1<ee2 if and only if (i)i<k or (ii)i=k, j < l Ex) 2 3 1 1 2 3 4 e2 e1 i j k l
  • 13.
  • 15. Alternative splicing4 1 2 3 4 D A R N D
  • 16.
  • 17. Enumeration of All Linear Subgraphs of a Linear Graph Before considering a mining algorithm, we have to solve the problem of subgraph enumeration first How to enumerate all subgraphs of the following linear graph without duplication?
  • 18. Search Lattice of All Subgraphs # of edges (level) 1 empty 2 3 4
  • 19. Reverse Search (Avis and Fukuda, 1993) All subgraphs can be enumerated by traversing the search lattice - To prevent duplication is difficult Need to define a search tree in the search lattice Reduction map f Mapping from a child to its parent Remove the largest edge f 2 2 3 1 1 1 2 3 4 1 2 3
  • 20. Search Tree induced by the reduction map By applying the reduction map to each element search tree can be induced empty
  • 21.
  • 22. Pattern Extension Rule 0-vertex addition (A-1) Parent Graph the largest edge new added edge (B-2) (B-3) (B-4) i i 1-vertex addition i i i (B-1) i (B-5) (B-6) (B-7) i i i 2-verteces addition (C-2) (C-3) (C-1) i i i (C-4) (C-5) (C-6) i i i
  • 23. Traversing search tree from root Depth first traversal for its memory efficiency the largest edge new added edge empty
  • 24.
  • 25.
  • 26.
  • 27. No edge labels.Rank the patterns by statistical significance (p-values) Association to thermophilic/methophilic label Fisher exact test
  • 28. Applying gSpan Want to compare the execution time of our algorithm with that of gSpan gSpan is not directly applicable - Contact maps are not always connected - Made 1-gap and 2-gap linear graphs
  • 29.
  • 30. Execution time of LGM is reasonable.gSpan does not work on the 2-gap linear graph dataset even if the minimum support threshold is 50.
  • 31. Minimum support = 10 103 patterns whose p-value < 0.001 Thermophilic (TATA), Mesophilic (pol II)
  • 32. Mapping motifs in 3D structure
  • 33.
  • 34.
  • 35.
  • 36. Classification Accuracy The accuracy of LGM is better than that of gSpan PAS representation is comparable to the other representations.
  • 37.
  • 38.
  • 39. Another topics Alignment algorithms for RNA sequences - Ph.D. study All pairs similarity search method - nearest neighbor graphs
  • 40. Q & A
  • 41. Data represented as linear graphs DNA, RNA, protein-3D structure, predicate argument structure - reference point: 5-strand(DNA, RNA), N-terminal (protein) Ex) RNA Protein (edge: 5Å) 1 2 3 4 1 2 3 4 3’ N C 5’ G U G C A R N D
  • 42.
  • 43. Right most pattern extension right most path A graph is extended from a vertex on the right most path v1 v1 v3 v2 v3 v4 v1 v2 v1 v4 v2 v1 v2 v3 v2 v3
  • 44. What is a code for an edge A code assigned for an edge in a graph - a set of label ids, vertex labels, edge ids Ex) ( vertex id1, vertex id2, vertex id1 label, vertex id2 label, edge label) v1 v2 v3 v4
  • 45.
  • 46. Motif extraction To extract protein-3D motifs, we use the Fisher’s exact test. The P-value can be computed by the sum of all probabilities of tables that are more extreme than this table. Ranked the frequent subgraphs according to the P-values. Focused on a pair of proteins, TATA-binding protein and human polIIpromotor protein Table1: 2×2 contingency table
  • 47. Unannoated Data VHLTPEEKKVVVK ? Prediction GGCCGGCCGGCCC ? Model Ex) HMM, SCFG etc ? Learning Feedback Annotated Data Ex) DNA, Protein, RNA etc ATGGGGCCCCGGC Gene VHLTPEEKKVVVK Protein RNA
  • 48. Algorithms for prediction and learning are based on Dynamic Programming (DP). Ordering in linear graphs is useful for designing DP algorithms