SlideShare una empresa de Scribd logo
1 de 36
Working with Trees in the Phyloinformatic Age William H. Piel Yale Peabody Museum Hilmar Lapp NESCent, Duke University
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object]
Dewey system: A B C D E 0.1 0.1.1 0.1.2 0.2 0.2.1 0.2.1.1 0.2.1.2 0.2.2 0
Find clade for: Z = (<C S +D s ) Find common pattern starting from left SELECT *  FROM nodes WHERE (path LIKE “0.2.1%”); 0.2.2 E 0.2.1.2 D 0.2.1.1 C 0.2.1 NULL 0.2 NULL 0.1.2 B 0.1.1 A 0.1 NULL 0 Root Path Label A B C D E
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object],[object Object]
Depth-first traversal scoring each node with a lef and right ID A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
SELECT *  FROM nodes INNER JOIN nodes AS include ON (nodes.left_id BETWEEN include.left_id AND include.right_id) WHERE include.node_id = 5 ; Minimum Spanning Clade of Node 5 16 15 E 13 12 D 11 10 C 14 9 17 8 6 5 B 4 3 A 7 2 18 1 Right Left Label A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
[object Object],[object Object],[object Object],[object Object]
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object],[object Object]
A B C D E 1 2 3 4 5 6 7 8 9 - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6
SQL Query to find parent node of node “D”: SELECT * FROM nodes AS parent INNER JOIN nodes AS child ON (child.parent_id = parent.node_id) WHERE child.node_label = ‘D’; … but this requires an external procedure to navigate the tree. - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6 node_label: node_id: parent_id:
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object],[object Object]
Searching trees by distance metrics:  USim distance Wang, J. T. L., H. Shan, D. Shasha and W. H. Piel. 2005. Fast Structural Search in Phylogenetic Databases.  Evolutionary Bioinformatics Online , 1: 37-46 A B C D A B C D 0 1 1 1 D 2 0 1 1 C 3 2 0 1 B 3 2 1 0 A D C B A 0 1 2 2 D 1 0 2 2 C 2 2 0 1 B 2 2 1 0 A D C B A
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object]
Transitive Closure ,[object Object],[object Object],[object Object],[object Object],[object Object]
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BioSQL:  http://www.biosql.org/ Schema for persistent storage of sequences and features tightly integrated with BioPerl (+ BioPython, BioJava, and BioRuby) •  phylodb extension designed at NESCent Hackathon  •  perl command-line interface by Jamie Estill, GSoC
CREATE TABLE node_path ( child_node_id integer, parent_node_id integer, distance integer); Index of all paths from ancestors to descendants A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B'; Find all paths where A and B share a common parent_node_id A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B' ORDER BY pA.distance LIMIT 1; … of those paths, select one that has the shortest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B' ORDER BY pA.distance DESC LIMIT 1; … of those paths, select one that has the longest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT e.parent_id AS parent, e.child_id AS child, ch.node_label, pt.tree_id FROM node_path p, edges e, nodes pt, nodes ch WHERE e.child_id = p.child_node_id AND pt.node_id = e.parent_id AND ch.node_id = e.child_id AND p.parent_node_id IN (        SELECT pA.parent_node_id        FROM   node_path pA, node_path pB, nodes nA, nodes nB        WHERE pA.parent_node_id = pB.parent_node_id        AND   pA.child_node_id = nA.node_id        AND   nA.node_label = 'A'        AND   pB.child_node_id = nB.node_id        AND   nB.node_label = 'B') AND NOT EXISTS (      SELECT 1 FROM node_path np, nodes n      WHERE    np.child_node_id = n.node_id      AND n.node_label  = 'C'      AND np.parent_node_id = p.parent_node_id); Find the maximum spanning clade (i.e. the subtree) for each tree that  includes A and B but not C: Get all  ancestors  shared by  A and B Exclude those that are also ancestors to C Return an adjacency list for each subtree
SELECT DISTINCT t.tree_id, t.name FROM node_path p, nodes ch, trees t WHERE ch.node_id = p.child_node_id AND ch.tree_id = t.tree_id AND p.parent_node_id IN ( SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B') AND NOT EXISTS ( SELECT 1 FROM node_path np, nodes n WHERE np.child_node_id = n.node_id AND n.node_label  = 'C' AND np.parent_node_id = p.parent_node_id); Find trees that contain a clade that includes A and B but not C: Get all  ancestors  shared by  A and B Exclude those that are also ancestors to C List the set of trees with these ancestors
SELECT qry.tree_id, MIN(qry.name) AS &quot;tree_name&quot; FROM ( SELECT DISTINCT ON (n.node_id) n.node_id, t.tree_id, t.name FROM trees t, nodes n,  (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca, WHERE n.node_id IN (lca2.parent_node_id) AND t.tree_id = n.tree_id AND NOT EXISTS (SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS (SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2)) AS qry GROUP BY (qry.tree_id) HAVING COUNT(qry.node_id) = 1; Find trees that contain a clade that includes (A, B, C) but not D or E: Get all ancestors of A, B, C from all trees that have  A, B, C Exclude those that are also ancestors to D, E But make sure that the tree still contains D, E Number of clades that each tree must satisfy Number of ingroups that share node Number of non-ingroups that must be in tree
SELECT t.tree_id, t.name FROM trees t INNER JOIN (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id, inN.tree_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca USING (tree_id)  WHERE NOT EXISTS ( SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS ( SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2); Here's a faster, cleaner version:
Matching a whole tree means querying for all clades (A, B) but not C, D, E (C, D) but not A, B, E (C, D, E) but not A, B A B C D E 1 2 3 4 5 6 7 8 9
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
(((Sus_scrofa, Hippopotamus),Balaenoptera),Equus_caballus) vs ((Sus_scrofa, (Hippopotamus,Balaenoptera)),Equus_caballus) Mining trees for interesting, general, relationship questions: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Balaenoptera Hippopotamus Sus scrofa Equus caballus Felis catus
Even if with perfectly-resolved OTUs, you will still fail to hit relevant trees: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Sus celebensis Hippopotamus Balaenoptera Equus asinus Felis catus
Step 1: for each clade all trees in database, run a stem query on a classification tree (e.g. NCBI) Stem Queries: Node 2: (>A, B - C, D, E) Node 3: (>A - B, C, D, E) Node 4: (>B - A, C, D, E) Node 5: (>C, D, E - A, B) Node 6: (>C, D - A, B, E) Node 7: (>C - A, B, D, E) Node 8: (>D - A, B, C, E) Node 9: (>E - A, B, C, D) Step 2: label each node with an NCBI taxon id (if there is a match) Step 3: do the same for the query tree A B C D E 1 2 3 4 5 6 7 8 9
Rename nodes according to their deepest stem query… Gorilla gorilla Homo sapiens Pan troglodytes Macaca sinica Macaca nigra Hominoidea Cercopithecoidea Gorilla Homo Pan Macaca sinica Macaca nigra Pongo pygmaeus Macaca irus Hominoidea Cercopithecoidea
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PhyloWidget ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thanks

Más contenido relacionado

La actualidad más candente (20)

Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Traversals | Data Structures
Traversals | Data StructuresTraversals | Data Structures
Traversals | Data Structures
 
Binary Search Tree and AVL
Binary Search Tree and AVLBinary Search Tree and AVL
Binary Search Tree and AVL
 
1.5 binary search tree
1.5 binary search tree1.5 binary search tree
1.5 binary search tree
 
Trees in Data Structure
Trees in Data StructureTrees in Data Structure
Trees in Data Structure
 
Tree and binary tree
Tree and binary treeTree and binary tree
Tree and binary tree
 
Tree in data structure
Tree in data structureTree in data structure
Tree in data structure
 
Database adapter
Database adapterDatabase adapter
Database adapter
 
Trees, Binary Search Tree, AVL Tree in Data Structures
Trees, Binary Search Tree, AVL Tree in Data Structures Trees, Binary Search Tree, AVL Tree in Data Structures
Trees, Binary Search Tree, AVL Tree in Data Structures
 
Database adapter
Database adapterDatabase adapter
Database adapter
 
07 trees
07 trees07 trees
07 trees
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
 
Data Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search TreeData Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search Tree
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in R
 
XSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshopXSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshop
 
Week 8 (trees)
Week 8 (trees)Week 8 (trees)
Week 8 (trees)
 
Lecture9 recursion
Lecture9 recursionLecture9 recursion
Lecture9 recursion
 
Trees
TreesTrees
Trees
 
Binary tree
Binary treeBinary tree
Binary tree
 
Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013
 

Destacado

Something about links
Something about linksSomething about links
Something about linksRoderic Page
 
Data Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. VisionData Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. VisionRoderic Page
 
Phyloinformatics: Introduction
Phyloinformatics: IntroductionPhyloinformatics: Introduction
Phyloinformatics: IntroductionRoderic Page
 
RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009Eugene Kalinin
 
Ответственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесеОтветственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесеEugene Kalinin
 
Making data sticky
Making data stickyMaking data sticky
Making data stickyRoderic Page
 
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Roderic Page
 
Трекшн карта
Трекшн картаТрекшн карта
Трекшн картаEugene Kalinin
 
Менторская программа Startup Magic
Менторская программа Startup MagicМенторская программа Startup Magic
Менторская программа Startup MagicEugene Kalinin
 
Трекшн карта и проблемное интервью
Трекшн карта и проблемное интервьюТрекшн карта и проблемное интервью
Трекшн карта и проблемное интервьюEugene Kalinin
 
Бизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в УльяновскеБизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в УльяновскеEugene Kalinin
 
Новый социальный процесс
Новый социальный процессНовый социальный процесс
Новый социальный процессEugene Kalinin
 
Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1Eugene Kalinin
 
сотрудничество
сотрудничествосотрудничество
сотрудничествоEugene Kalinin
 

Destacado (18)

Something about links
Something about linksSomething about links
Something about links
 
Programma Congresso Cndec 2012
Programma Congresso  Cndec 2012Programma Congresso  Cndec 2012
Programma Congresso Cndec 2012
 
Data Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. VisionData Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. Vision
 
Phyloinformatics: Introduction
Phyloinformatics: IntroductionPhyloinformatics: Introduction
Phyloinformatics: Introduction
 
RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009
 
Ответственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесеОтветственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесе
 
Etot about startup
Etot about startupEtot about startup
Etot about startup
 
Making data sticky
Making data stickyMaking data sticky
Making data sticky
 
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
 
Трекшн карта
Трекшн картаТрекшн карта
Трекшн карта
 
Менторская программа Startup Magic
Менторская программа Startup MagicМенторская программа Startup Magic
Менторская программа Startup Magic
 
Трекшн карта и проблемное интервью
Трекшн карта и проблемное интервьюТрекшн карта и проблемное интервью
Трекшн карта и проблемное интервью
 
Бизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в УльяновскеБизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
 
Новый социальный процесс
Новый социальный процессНовый социальный процесс
Новый социальный процесс
 
Репутация
РепутацияРепутация
Репутация
 
Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1
 
сотрудничество
сотрудничествосотрудничество
сотрудничество
 
Parliamo di SOA
Parliamo di SOAParliamo di SOA
Parliamo di SOA
 

Similar a Working with Trees in the Phyloinformatic Age. WH Piel

A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaRoderic Page
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Paul Richards
 
Trees in data structrures
Trees in data structruresTrees in data structrures
Trees in data structruresGaurav Sharma
 
Phylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryPhylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryRoderic Page
 
Lecture notes data structures tree
Lecture notes data structures   treeLecture notes data structures   tree
Lecture notes data structures treemaamir farooq
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
 
Phylogenetic analyses1
Phylogenetic analyses1Phylogenetic analyses1
Phylogenetic analyses1Satyam Sonker
 
Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎2t3
 
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointCassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointWorks Applications
 
Admissions in india 2015
Admissions in india 2015Admissions in india 2015
Admissions in india 2015Edhole.com
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
Trees - Non Linear Data Structure
Trees - Non Linear Data StructureTrees - Non Linear Data Structure
Trees - Non Linear Data StructurePriyanka Rana
 

Similar a Working with Trees in the Phyloinformatic Age. WH Piel (20)

A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
Tree and Binary Search tree
Tree and Binary Search treeTree and Binary Search tree
Tree and Binary Search tree
 
Trees in data structrures
Trees in data structruresTrees in data structrures
Trees in data structrures
 
Phylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryPhylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V Berry
 
Lecture notes data structures tree
Lecture notes data structures   treeLecture notes data structures   tree
Lecture notes data structures tree
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Cg7 trees
Cg7 treesCg7 trees
Cg7 trees
 
philogenetic tree
philogenetic treephilogenetic tree
philogenetic tree
 
Unit 4.1 (tree)
Unit 4.1 (tree)Unit 4.1 (tree)
Unit 4.1 (tree)
 
Recursive Query Throwdown
Recursive Query ThrowdownRecursive Query Throwdown
Recursive Query Throwdown
 
Module 8.1 Trees.pdf
Module 8.1 Trees.pdfModule 8.1 Trees.pdf
Module 8.1 Trees.pdf
 
Phylogenetic analyses1
Phylogenetic analyses1Phylogenetic analyses1
Phylogenetic analyses1
 
Data Structures
Data StructuresData Structures
Data Structures
 
Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎
 
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointCassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
 
Admissions in india 2015
Admissions in india 2015Admissions in india 2015
Admissions in india 2015
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
Trees - Non Linear Data Structure
Trees - Non Linear Data StructureTrees - Non Linear Data Structure
Trees - Non Linear Data Structure
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 

Más de Roderic Page

ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)Roderic Page
 
Wikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphWikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphRoderic Page
 
Ozymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaOzymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaRoderic Page
 
SLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphSLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphRoderic Page
 
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsWild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsRoderic Page
 
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graphTowards a biodiversity knowledge graph
Towards a biodiversity knowledge graphRoderic Page
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talkRoderic Page
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataRoderic Page
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...Roderic Page
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyondRoderic Page
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital CatapultRoderic Page
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stRoderic Page
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responsesRoderic Page
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talkRoderic Page
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge GraphsRoderic Page
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal viewRoderic Page
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldRoderic Page
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Roderic Page
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaRoderic Page
 

Más de Roderic Page (20)

ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)
 
Wikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphWikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge Graph
 
BioStor Next
BioStor NextBioStor Next
BioStor Next
 
Ozymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaOzymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living Australia
 
SLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphSLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graph
 
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsWild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
 
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graphTowards a biodiversity knowledge graph
Towards a biodiversity knowledge graph
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talk
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long data
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyond
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital Catapult
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21st
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responses
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talk
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge Graphs
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal view
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living world
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, India
 

Último

TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 

Último (20)

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 

Working with Trees in the Phyloinformatic Age. WH Piel

  • 1. Working with Trees in the Phyloinformatic Age William H. Piel Yale Peabody Museum Hilmar Lapp NESCent, Duke University
  • 2.
  • 3.
  • 4. Dewey system: A B C D E 0.1 0.1.1 0.1.2 0.2 0.2.1 0.2.1.1 0.2.1.2 0.2.2 0
  • 5. Find clade for: Z = (<C S +D s ) Find common pattern starting from left SELECT * FROM nodes WHERE (path LIKE “0.2.1%”); 0.2.2 E 0.2.1.2 D 0.2.1.1 C 0.2.1 NULL 0.2 NULL 0.1.2 B 0.1.1 A 0.1 NULL 0 Root Path Label A B C D E
  • 6.
  • 7.
  • 8. Depth-first traversal scoring each node with a lef and right ID A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
  • 9. SELECT * FROM nodes INNER JOIN nodes AS include ON (nodes.left_id BETWEEN include.left_id AND include.right_id) WHERE include.node_id = 5 ; Minimum Spanning Clade of Node 5 16 15 E 13 12 D 11 10 C 14 9 17 8 6 5 B 4 3 A 7 2 18 1 Right Left Label A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
  • 10.
  • 11.
  • 12. A B C D E 1 2 3 4 5 6 7 8 9 - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6
  • 13. SQL Query to find parent node of node “D”: SELECT * FROM nodes AS parent INNER JOIN nodes AS child ON (child.parent_id = parent.node_id) WHERE child.node_label = ‘D’; … but this requires an external procedure to navigate the tree. - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6 node_label: node_id: parent_id:
  • 14.
  • 15. Searching trees by distance metrics: USim distance Wang, J. T. L., H. Shan, D. Shasha and W. H. Piel. 2005. Fast Structural Search in Phylogenetic Databases. Evolutionary Bioinformatics Online , 1: 37-46 A B C D A B C D 0 1 1 1 D 2 0 1 1 C 3 2 0 1 B 3 2 1 0 A D C B A 0 1 2 2 D 1 0 2 2 C 2 2 0 1 B 2 2 1 0 A D C B A
  • 16.
  • 17.
  • 18.
  • 19. BioSQL: http://www.biosql.org/ Schema for persistent storage of sequences and features tightly integrated with BioPerl (+ BioPython, BioJava, and BioRuby) • phylodb extension designed at NESCent Hackathon • perl command-line interface by Jamie Estill, GSoC
  • 20. CREATE TABLE node_path ( child_node_id integer, parent_node_id integer, distance integer); Index of all paths from ancestors to descendants A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 21. SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B'; Find all paths where A and B share a common parent_node_id A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 22. SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B' ORDER BY pA.distance LIMIT 1; … of those paths, select one that has the shortest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 23. SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B' ORDER BY pA.distance DESC LIMIT 1; … of those paths, select one that has the longest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 24. SELECT e.parent_id AS parent, e.child_id AS child, ch.node_label, pt.tree_id FROM node_path p, edges e, nodes pt, nodes ch WHERE e.child_id = p.child_node_id AND pt.node_id = e.parent_id AND ch.node_id = e.child_id AND p.parent_node_id IN (       SELECT pA.parent_node_id       FROM   node_path pA, node_path pB, nodes nA, nodes nB       WHERE pA.parent_node_id = pB.parent_node_id       AND   pA.child_node_id = nA.node_id       AND   nA.node_label = 'A'       AND   pB.child_node_id = nB.node_id       AND   nB.node_label = 'B') AND NOT EXISTS (     SELECT 1 FROM node_path np, nodes n     WHERE    np.child_node_id = n.node_id     AND n.node_label  = 'C'     AND np.parent_node_id = p.parent_node_id); Find the maximum spanning clade (i.e. the subtree) for each tree that includes A and B but not C: Get all ancestors shared by A and B Exclude those that are also ancestors to C Return an adjacency list for each subtree
  • 25. SELECT DISTINCT t.tree_id, t.name FROM node_path p, nodes ch, trees t WHERE ch.node_id = p.child_node_id AND ch.tree_id = t.tree_id AND p.parent_node_id IN ( SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B') AND NOT EXISTS ( SELECT 1 FROM node_path np, nodes n WHERE np.child_node_id = n.node_id AND n.node_label = 'C' AND np.parent_node_id = p.parent_node_id); Find trees that contain a clade that includes A and B but not C: Get all ancestors shared by A and B Exclude those that are also ancestors to C List the set of trees with these ancestors
  • 26. SELECT qry.tree_id, MIN(qry.name) AS &quot;tree_name&quot; FROM ( SELECT DISTINCT ON (n.node_id) n.node_id, t.tree_id, t.name FROM trees t, nodes n, (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca, WHERE n.node_id IN (lca2.parent_node_id) AND t.tree_id = n.tree_id AND NOT EXISTS (SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS (SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2)) AS qry GROUP BY (qry.tree_id) HAVING COUNT(qry.node_id) = 1; Find trees that contain a clade that includes (A, B, C) but not D or E: Get all ancestors of A, B, C from all trees that have A, B, C Exclude those that are also ancestors to D, E But make sure that the tree still contains D, E Number of clades that each tree must satisfy Number of ingroups that share node Number of non-ingroups that must be in tree
  • 27. SELECT t.tree_id, t.name FROM trees t INNER JOIN (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id, inN.tree_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca USING (tree_id) WHERE NOT EXISTS ( SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS ( SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2); Here's a faster, cleaner version:
  • 28. Matching a whole tree means querying for all clades (A, B) but not C, D, E (C, D) but not A, B, E (C, D, E) but not A, B A B C D E 1 2 3 4 5 6 7 8 9
  • 29.
  • 30. (((Sus_scrofa, Hippopotamus),Balaenoptera),Equus_caballus) vs ((Sus_scrofa, (Hippopotamus,Balaenoptera)),Equus_caballus) Mining trees for interesting, general, relationship questions: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Balaenoptera Hippopotamus Sus scrofa Equus caballus Felis catus
  • 31. Even if with perfectly-resolved OTUs, you will still fail to hit relevant trees: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Sus celebensis Hippopotamus Balaenoptera Equus asinus Felis catus
  • 32. Step 1: for each clade all trees in database, run a stem query on a classification tree (e.g. NCBI) Stem Queries: Node 2: (>A, B - C, D, E) Node 3: (>A - B, C, D, E) Node 4: (>B - A, C, D, E) Node 5: (>C, D, E - A, B) Node 6: (>C, D - A, B, E) Node 7: (>C - A, B, D, E) Node 8: (>D - A, B, C, E) Node 9: (>E - A, B, C, D) Step 2: label each node with an NCBI taxon id (if there is a match) Step 3: do the same for the query tree A B C D E 1 2 3 4 5 6 7 8 9
  • 33. Rename nodes according to their deepest stem query… Gorilla gorilla Homo sapiens Pan troglodytes Macaca sinica Macaca nigra Hominoidea Cercopithecoidea Gorilla Homo Pan Macaca sinica Macaca nigra Pongo pygmaeus Macaca irus Hominoidea Cercopithecoidea
  • 34.
  • 35.