Sharing re-usable phylogenetic data: we're not there yet

Sharing reusable phylogenetic data:
we're not there yet

Ross Mounce
@rmounce
http://orcid.org/0000-0002-3520-2046

A talk of
two halves
1.) Outlining the extent of the problem
(lack of) sharing, standards, care (?)
2.) What I'm trying to do about it:
Digging data out of PDFs
Re-releasing as

Where's the data?
Just ~4% of published phylogenetic studies in 2010
publicly archived their supporting phylo data in

Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Gillespie E, Kumar S, Rosauer D, & Vos R. 2012

Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis
BMC Research Notes 10.1186/1756-0500-5-574

Check our data yourself on Dryad here: 10.5061/dryad.h6pf365t

Scientists cannot be relied upon to
share published data upon request
This has been known for a while now
e.g. (in Psychology) Wicherts et al 2006
But has been confirmed to be true for phylogenetics too:
Drew et al 2013 'Lost Branches in the Tree of Life'
report that just ~16% of researchers contacted supplied
the requested ('published') phylo data.
My own experience tallies with this – I soon stopped bothering to try and
ask people via email for a copy of their published data. It's a waste of time.

The (Single) Supplementary Data File
was a Y2K solution – a dump
Many legacy journal supplementary data systems
bury data and leave it there to decompose
Often not re-usable in form e.g. a lazy PDF
Sometimes 'typeset', corrupting the data
A jumble of words & data where the bit you
want is on page 92 (no programmatic access)

Research
BURIED and really not very discoverable
Data

Do reviewers even look at it? I think not tbh

I wasted too much of my PhD
trying to get usable data to re-analyze
This is what I felt like...

So I tried to do something
about it...

An open letter in support of
palaeontology data archiving
www.supportpalaeodatarchiving.co.uk

Which was picked-up by Nature News
Which, in turn got me in touch with:

Part 2
Since few will help you to re-use their data
You've got to dig it out
and
make it re-usable yourself
AND
re-release it openly
so no-one else wastes their time doing this

It's not just phylogenetics.
I learned from the Open Knowledge Conference (Berlin 2011)
that a lot different academic fields seem also struggle to
make re-usable published data available.

If it's a common, shared-problem...
why not seek a shared, cross-disciplinary solution?

AMI (Amanuensis)
Building upon tools first developed
in computational chemistry by the Murray-Rust lab
e.g.
ChemicalTagger → PhyloTagger (Entity tagging)
(Chem)PubCrawler → (Phylo)PubCrawler
(to getting 10,000+ PDFs to work on)

https://bitbucket.org/nickday/pub-crawler
http://www-ucc.ch.cam.ac.uk/products/software/chemicaltagger
Open Source

BBSRC grant approved
“PLUTo: Phyloinformatic Literature Unlocking Tools”
Software for making published phyloinformatic
data discoverable, open, and reusable
...I just need to get my PhD viva done & rubber-stamped

Instructions for getting the current working setup here:
(multiple repositories, dependencies & requirements!)
http://rossmounce.co.uk/2013/10/06/setting-up-ami2-on-windows/

PDF 
HTML


AMI

Evolution of ultraviolet
vision in the largest avian
radiation - the passerines
Anders Ödeen 1* , Olle Håstad
and Per Alström 4

2,3

Styles , superscripts
And diåcritics
preserved!

PDF 
Turdus iliacus
Taeniopygia guttata
Serinus canaria
Lanius excubitor
Melopsittacus undulatus
Pavo cristatus
Sturnus vulgaris
Dolichonyx oryzivorus
Ficedula hypoleuca
Vaccinium myrtillus
Falco tinnunculus

Turdus
Pomatostomus
Leothrix
Amytornis
Acanthisitta
Orthonyx x 2
Malurus
Cnemophilus x 4
Philesturnus x 2
Motacilla x 2
Toxorhampus x 2

Typical phylo tree: 60 nodes, complex and miniscule annotation,
vertical text, hyphenation and valuable branch lengths. AMI extracts ALL

AMI
0.84
0.91
0.93
0.95
Posterior
probability

23.12
34.54
37.21
38.55
Branch
lengths

NexML
HTML

Acanthisitta
Acrocephalus
Ailuroedus
Ailuroedus
Amytornis
Camptostoma

Acanthisittidae
Acanthizidae
Acrocephalidae
Callaeidae
Campephagidae
Cnemophilidae
Corvidae

Genus

Family

Acknowledgements & Thanks

For the Panton Fellowship,
inspiration and support

To the organisers
of both the session:
Nico, Hilmar, Rutger
and the conference
as a whole!

For travel & accommodation
support, without which I couldn't
possibly attend TDWG

My main collaborators on PLUTo: Matthew Wills and Peter Murray-Rust

Sharing re-usable phylogenetic data: we're not there yet

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Sharing re-usable phylogenetic data: we're not there yet

Similar a Sharing re-usable phylogenetic data: we're not there yet (20)

Más de Ross Mounce

Más de Ross Mounce (8)

Último

Último (20)

Sharing re-usable phylogenetic data: we're not there yet