SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Integra(ng	data	with	phylogenies,	
at	scale	
Nico	Cellinese	
University	of	Florida	
&	
Hilmar	Lapp	
Duke	University
WHAT’S	IN	A	NAME?
What’s	in	a	name?	
Chaos!	
•  Names	and	Concepts	do	not	
reconcile	that	easily	
•  Names	are	text	strings	
•  Context	is	lacking	or	subjec(ve	
•  Meaning	is	not	computable
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
I	don’t	understand	any	of	those	concepts	
whether	in	LaDn	or	English,	but	I	can	sDll	
link	them	to	their	names,	as	in	one	object	
to	one	object
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
From	a	human	perspecDve,	we	lose	track	of	concepts.	Hard	to	reconcile	all	of	them.	We	need	
help!	Can	we	compute	them?	
Idiosyncratic Russian dolls syndrome
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
•  We	can	uncluNer	concepts,	and	thereby	
nomenclature	
•  How	do	we	navigate	along	the	Tree	of	Life	
repurposing	Linnean	names,	which	are	
linked	to	tradi(onal	concepts?
Dark	taxa!
Dark	taxa!	
How	do	we	integrate	data	with	this	tree?
Tree-thinking	
Common	descent	àevoluDon	at	the	center	of	taxonomy	
B	 C	 D	
Branches	
Synapomorphies	
A	
Clades	=	taxa	
Discovery
Tree-thinking	
Common	descent	àevoluDon	at	the	center	of	taxonomy	
Discovery	
CommunicaDon	How??	
0147
Density
0.07
0.22
0.72
Diversification rate
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
These	names	are	not	generated	in	an	evoluDonary-based	framework	
(Groups	defined	by	character	similarity	vs.	common	descent)
Both	the	Encyclopedia	of	Life	(EOL)	and	the	Open	Tree	of	Life	suggest	that	
Campanuloideae	is	a	misspelling	of	Campaniloidea	(marine	gastropods!)		
GBIF	does	not	currently	have	Campanuloideae	in	its	backbone	taxonomy.
Are	you	kidding	me?	
These	are	the	Campanuloideae!	
Wang	et	al.	2014
Life	as	a	street	map	How	to	navigate	life	as	a	machine
Mapping	data	to	phylogene(c	
knowledge	space
Street	signs	serve	people,	not	machines
•  How	do	we	build	a	reliable	GPS	for	phylogenies?	
•  How	do	we	reproducibly	find	the	right	nodes?	
	
Mapping	data	to	phylogene(c	
knowledge	space
FEED
Textual Definition –
The hyoglossus is a muscle that attaches to
the hyoid and tongue and is innervated by
Cranial Nerve XII.
Computable Definition –
('attached to' some 'hyoid bone')
and ('attached to' some tongue)
and ('innervated by' some 'hypoglossal
nerve') and
spatially disjoint with 'intrinsic tongue
muscle'
Druzinsky	et	al	(2015):	Logic	definiDons	of	mammalian	
feeding	muscles	by	means	of	necessary	and	sufficient	
condiDons	true	for	all	mammals	
Nomenclature	≠	Seman(cs
Phyloreference	
=	
Logic	defini(on	of	a	clade,	
using	the	property	common	to	
all	of	life
Phyloreferences	
Statements	formally	expressing	the	paaerns	we	discover	
(analogous	to	map	coordinates)	
	
Node-Based Branch-Based Apomorphy-Based
A B C A B C A B C
X
The	clade	originaDng	
with	the	last	common	
ancestor	of	B	and	C.	
The	clade	originaDng	
with	the	first	ancestor	of	
B	that	is		not	an	
ancestor	of	A.	
The	clade	originaDng	
with	the	first	ancestor	
of	C	to	evolve	X.
Phyloreferences	yield	a	
coordinate	system	for	the	Tree	of	Life	
•  Any	node,	branch,	subtree	is	referenceable	
•  References	are	unambiguous	
•  References	are	computable	
•  References	are	portable	
•  Adapts	to	new	and	changing	knowledge
Many	needed	technologies	already	exist	
•  OWL	ontologies	designed	
for	
–  PhylogeneDc	knowledge:	
CDAO	
–  Phenotypic	knowledge:	
Uberon,	PATO,	…	
–  Efficient	and	expressive	
reasoners:	FaCT++,	HermiT,	
Racer,	ELK
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1889_to_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Crysanthemum
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Lobelia
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_aier_1995	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Sphenoclea
Phyloreferences	as	ontological	expressions	
Phyloreference	expressions	
can	be:		
•  Easily	generated	by	
anyone	
•  Can	work	on	any	tree	
•  Named	and	registered	
– To	promote	reuse	and	
consistency	
– To	improve	usability	
and	accessibility	
Class:	Campanulaceae	
Annota(ons:	
				rdfs:label	“Campanulaceae_aier_1995”	
				dc:descripDon	“the	clade	that	includes	
Campanula	laDfolia	but	not	Sphenoclea”	
EquivalentTo:		
cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
Class:	AGF4-SHRU-3560	
EquivalentTo:		
	cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
vs.
Challenges	
•  OWL-based	data	model	to	saDsfy	phylogeneDc	
taxonomy,	reasoning	expressivity,	scalability	
•  ConvenDons	for	data	transformaDon,	and	
consequences	of	different	choices	
•  Least	common	ancestor	reasoning	for	OWL	
data	
•  Lack	of	canonical	specimen	idenDfier	system	
•  Specifier	mapping	ontologies
Tree	of	Life,	ontologized:	
A	universal	coordinate	system	
•  The	Tree	of	Life	is	itself	an	aggregaDon	and	
integraDon	of	our	phylogeneDc	knowledge.	
•  Phyloreferencing	is	addressing	into	a	knowledge	
universe.	
•  Ontologies,	reasoning,	and	other	KR	techniques	
are	powerful	tools	for	this.
Acknowledgements	
•  NaDonal	Science	FoundaDon	(DBI-1458484)	
•  Ken	and	Linda	McGurn	
•  Phenoscape	
•  EvoIO

Más contenido relacionado

Más de Hilmar Lapp

The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
Hilmar Lapp
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Hilmar Lapp
 
Lapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing SymposiumLapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing Symposium
Hilmar Lapp
 

Más de Hilmar Lapp (14)

Open Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionOpen Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some Introspection
 
Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014 Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
Bringing reason to phenotype diversity, character change, and common descent
Bringing reason to phenotype diversity, character change, and common descentBringing reason to phenotype diversity, character change, and common descent
Bringing reason to phenotype diversity, character change, and common descent
 
Phyloinformatics VoCamp
Phyloinformatics VoCampPhyloinformatics VoCamp
Phyloinformatics VoCamp
 
Reasoning over phenotype diversity, character change, and evolutionary descent
Reasoning over phenotype diversity, character change, and evolutionary descentReasoning over phenotype diversity, character change, and evolutionary descent
Reasoning over phenotype diversity, character change, and evolutionary descent
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
 
Liberating Our Beautiful Trees: A Call to Arms.
Liberating Our Beautiful Trees: A Call to Arms.Liberating Our Beautiful Trees: A Call to Arms.
Liberating Our Beautiful Trees: A Call to Arms.
 
OBF Address at BOSC 2012
OBF Address at BOSC 2012OBF Address at BOSC 2012
OBF Address at BOSC 2012
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Lapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing SymposiumLapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing Symposium
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Último (20)

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Integrating data with phylogenies, at scale