5. Problem 1: Data Volume
• Cost of analysis follows Moore's Law:
– 1 Student with 1 computer to analyze 1 Mb of
data produced in 2001
– 200 Students and 200 computers to analyze all
data produced for the same cost today (10 Gb)
6. Problem 2: Fragmented Analytical Landscape
1. Tools separated by compute
platform, data format, integration
issues, and programming model.
1. Mixture of desktop, command
line, database, and web-based
tools
2. Labor intensive, fragile solutions
devised to reach scientific
objectives
3. Little ability to share results,
analytical methods
4. Lack of reproducibility
7. Scalability
ABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC
8.
9.
10. Major Ways to Access iPlant
• Storing and sharing data large and small: iPlant Data
Storage
• Integrated web-based analysis: The Discovery Environment
• Cloud computing: Atmosphere
• Applications: TNRS, TreeViewer, PhytoBisque, etc
• Scientific networking, knowledgebase and information
exchange: My-Plant.org
• Educational tools: DNASubway
• Embedding iPlant CI capabilities into software: The
Foundation API
• High Performance Computing for experts: TeraGrid/XSEDE
10
11. Why is the tree of life important?
“Knowledge of evolutionary relationships is
fundamental to biology, yielding new insights
across the plant sciences, from comparative
genomics and molecular evolution, to plant
development, to the study of adaptation,
speciation, community assembly, and
ecosystem functioning.”
12. Nothing in biology makes
sense except in the light
of evolution.
T. G. Dobzahnsky
15. "We combined geospatial and molecular
sequence data from two public archives to
produce a 1,230-taxon phylogeny of the grasses
with accompanying climate data for all species,
extracted from more than 1.1 million herbarium
specimens."
Edwards and Smith, 2010
16. "Here we show that grasses are ancestrally a
warm-adapted clade and that C4 evolution
was not correlated with shifts between
temperate and tropical biomes. Instead, 18
of 20 inferred C4 origins were correlated with
marked reductions in mean annual
precipitation."
24. iPlant's APIs – The Foundation API
Service Role
Endpoint
IO File storage, retrieval and management. Database
interoperability
DATA File format conversion
APPS Registration and discovery of HPC applications
JOB Submission and management of compute jobs
SYSTEMS Availability and info about XSEDE hosts
PROFILE User profile discovery
AUTH Token based secure authentication
POSTIT URL shortener
29. Overview of the iPlant Data Store
Some important items we won’t see in the demo
Source Destination Copy Method Time (seconds)
CD My Computer cp 320
Berkeley Server My Computer scp 150
External Drive My Computer cp 36
USB2.0 Flash My Computer cp 30
iDS MyComputer iget 18
My Computer My Computer cp 15
Close to optimum conditions; transfer between
Univ. of Arizona and UC Berkeley
100GB: 29m15s
1 GB / 17.5 seconds
30.
31. Tree Visualization
• > 500K Taxa
• Fast
• Web based, platform independent
• Semantic zooming
• Metadata driven display of information
36. a) Centaurium curvistamineum
(Wittr.) Abrams (1951)
b) Centaurium minimum (Howell)
Piper (1915)
c) Centaurium muhlenbergii (Griseb.)
Wight ex Piper (1906)
d) Centaurium muhlenbergii (Griseb.)
Wight ex Piper forma albiflorum
(Suksd.) St. John (1937)
e) Centaurium muhlenbergii (Griseb.)
Wight ex Piper var. albiflorum
Suksd. (1927)
f) Centaurodes muhlenbergii
(Griseb.) Kuntze (1891)
g) Erythraea curvistaminea Wittr.
(1886)
h) Erythraea minima Howell (1901)
i) Erythraea muhlenbergii Griseb.
(1839)
Image: Gordon Leppig & Andrea J. Pickart
37.
38.
39. Request Tool Installation
Apps -> Create -> New App
Create New -> Request Tool Installation
Fill out forms and submit.
Receive response in 2-5 days.
Notas del editor
Bringing a culture of computing to the Plant Sciences.
The state of the art today. On the left are icons representing SOME of the ways we work with data.Tools are separated from one another by compute platform, data format, integration issues, programming model.Often a mixture of desktop, command line, database, and web-page based analysesLabor intensive, fragile solutions devised to reach scientific objectivesLittle ability to share results, analytical methods, or to work collaborativelyWe can INVERT the language of the COMPLAINTS to form DESIGN PRINCIPLES.Going to focus on a couple of NGS cases in my talk
Left tree: Maple tree phylogeny from D. AckerlyLeft picture: Joe Felsenstein, ca. 1980Right picture: Ranger cluster at TACC
Our understanding of the phylogeny of the half million known species of green plants has expanded dramatically over the past two decades, The task of assembling a comprehensive "tree of life" for them presents a Grand Challenge.Also part of the grand challenge is developing the necessary infrastructre to view and use the tree of life, to put it into the hands of plant biologists
Public archives:MAT = Mean Annual TemperatureStephen Smith. iPlant supported postdoc. Now Assistant professor at the U MichiganPublished in PNAS last year
Left tree: Maple tree phylogeny from D. AckerlyLeft picture: Joe Felsenstein, ca. 1980Right picture: Ranger cluster at TACCNew sequencing technologies – Computational Power and Simplified access to computational resources allow us to move from local to global scale. Climate change, nutrition global scale.
Highest level of abstraction. Exactly like we can embed recent tweets in our web page, portal builders can add tools and services to their portals. E.g. BioExtract and CIPRES
From the Apps catalog in the DE, select Create -> New AppOpens the Tool Integration interfaceSelect: Create New -> Request Tool InstallationFill out the form and submit it.It takes 2-5 business days to deploy the tool.