SlideShare una empresa de Scribd logo
1 de 39
The iPlant Collaborative:
A Cyberinfrastructure for the Life
            Sciences
          Naim Matasci
  BIO5 / The iPlant Collaborative
What is iPlant?
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
Problem 1: Data Volume

• Cost of analysis follows Moore's Law:
  – 1 Student with 1 computer to analyze 1 Mb of
    data produced in 2001
  – 200 Students and 200 computers to analyze all
    data produced for the same cost today (10 Gb)
Problem 2: Fragmented Analytical Landscape
                      1. Tools separated by compute
                         platform, data format, integration
                         issues, and programming model.

                      1. Mixture of desktop, command
                         line, database, and web-based
                         tools

                      2. Labor intensive, fragile solutions
                         devised to reach scientific
                         objectives

                      3. Little ability to share results,
                         analytical methods

                      4. Lack of reproducibility
Scalability




ABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC
Major Ways to Access iPlant
• Storing and sharing data large and small: iPlant Data
  Storage
• Integrated web-based analysis: The Discovery Environment
• Cloud computing: Atmosphere
• Applications: TNRS, TreeViewer, PhytoBisque, etc
• Scientific networking, knowledgebase and information
  exchange: My-Plant.org
• Educational tools: DNASubway
• Embedding iPlant CI capabilities into software: The
  Foundation API
• High Performance Computing for experts: TeraGrid/XSEDE

                                                   10
Why is the tree of life important?


“Knowledge of evolutionary relationships is
fundamental to biology, yielding new insights
across the plant sciences, from comparative
genomics and molecular evolution, to plant
development, to the study of adaptation,
speciation, community assembly, and
ecosystem functioning.”
Nothing in biology makes
sense except in the light
of evolution.

                T. G. Dobzahnsky
C3 to C4 Photosynthesis




Xin-Guang et al. 2008
"We combined geospatial and molecular
sequence data from two public archives to
produce a 1,230-taxon phylogeny of the grasses
with accompanying climate data for all species,
extracted from more than 1.1 million herbarium
specimens."




 Edwards and Smith, 2010
"Here we show that grasses are ancestrally a
warm-adapted clade and that C4 evolution
was not correlated with shifts between
temperate and tropical biomes. Instead, 18
of 20 inferred C4 origins were correlated with
marked reductions in mean annual
precipitation."
New Possibilities



                                                                      Acer glabrum
                                                                      Acer saccharinum
                                                                      Acer rubrum
                                                                      Acer distylum
                                                                      Acer macrophyllum
                                                                      Acer nipponicum
                                                                      Acer spicatum
                                                                      Acer carpinifolium
                                                                      Acer diabolicum
                                                                      Acer circinatum
                                                                      Acer sieboldianum
                                                                      Acer palmatum
                                                                      Acer saccharum
                                                                      Acer tschonoskii
                                                                      Acer rufinerve
                                                                      Acer pensylvanicum
                                                                      Acer crataegifolium
                                                                      Acer mono




illumina Genome Analyzer, Ranger Cluster at TACC, Acer phylogeny (Ackerly 2009), Green Plant ToL
Just Ask
Atmosphere
iPlant's APIs – The Foundation API
      Service                            Role
     Endpoint
IO              File storage, retrieval and management. Database
                interoperability
DATA            File format conversion

APPS            Registration and discovery of HPC applications


JOB             Submission and management of compute jobs


SYSTEMS         Availability and info about XSEDE hosts

PROFILE         User profile discovery

AUTH            Token based secure authentication

POSTIT          URL shortener
Consumer Applications




                        25
iPlant Data Store




Dramatization: Not the actual iPlant Data Store
Overview of the iPlant Data Store
      Some important items we won’t see in the demo
Source            Destination      Copy Method       Time (seconds)
CD                My Computer      cp                320
Berkeley Server   My Computer      scp               150
External Drive    My Computer      cp                36
USB2.0 Flash      My Computer      cp                30
iDS               MyComputer       iget              18
My Computer       My Computer      cp                15

           Close to optimum conditions; transfer between
                  Univ. of Arizona and UC Berkeley
                           100GB: 29m15s
                         1 GB / 17.5 seconds
Tree Visualization

•   > 500K Taxa
•   Fast
•   Web based, platform independent
•   Semantic zooming
•   Metadata driven display of information
iPlant Tree Viewer




http://portnoy.iplantcollaborative.org/
LIVE TREE VIEW DEMO
Obstacles
                                     Lobelia_kauaensis
                                     Lobelia_villosa
                                     Lobelia_gloria-montis
                                     Trematolobelia_kauaiensis
                                     Trematolobelia_macrostachys
                                     Lobelia_hypoleuca
                                     Lobelia_yuccoides
                                     Lobelia_niihauensis
                                     Brighamia_insignis
                                     Brighamia_rockii
                                     Delissea_rhytidosperma
                                     Delissea_subcordata
                                     Cyanea_pilosa
                                     Cyanea_acuminata
                                     Cyanea_hirtella
                                     Cyanea_coriacea
                                     Cyanea_leptostegia
                                     Clermontia_kakeana
                                     Clermontia_parviflora
                                     Clermontia_arborescens
                                     Clermontia_fauriei




Number of taxa               Taxa names
Taxonomic uncertainty

1. Non-existent names
  •   Misspellings
  •   Contamination
      •   Annotations
      •   Morphospecies
      •   Digitization issues (frame shifts, character
          encoding)Lexical variants (digitization conventions)
2. Synonymy
  •   Nomenclatural synonyms
  •   Taxonomic synonyms / concepts
3. Misidentifications, incomplete identifications
a) Centaurium curvistamineum
                                               (Wittr.) Abrams (1951)
                                           b) Centaurium minimum (Howell)
                                               Piper (1915)
                                           c) Centaurium muhlenbergii (Griseb.)
                                               Wight ex Piper (1906)
                                           d) Centaurium muhlenbergii (Griseb.)
                                               Wight ex Piper forma albiflorum
                                               (Suksd.) St. John (1937)
                                           e) Centaurium muhlenbergii (Griseb.)
                                               Wight ex Piper var. albiflorum
                                               Suksd. (1927)
                                           f) Centaurodes muhlenbergii
                                               (Griseb.) Kuntze (1891)
                                           g) Erythraea curvistaminea Wittr.
                                               (1886)
                                           h) Erythraea minima Howell (1901)
                                           i) Erythraea muhlenbergii Griseb.
                                               (1839)



Image: Gordon Leppig & Andrea J. Pickart
Request Tool Installation
        Apps -> Create -> New App




Create New -> Request Tool Installation




                                 Fill out forms and submit.
                                 Receive response in 2-5 days.

Más contenido relacionado

Más de Naim Matasci

Más de Naim Matasci (9)

iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3
 
iPlant TNRS
iPlant TNRSiPlant TNRS
iPlant TNRS
 
Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliation
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of Life
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses Workflow
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic Workflows
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for Plants
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

  • 1. The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences Naim Matasci BIO5 / The iPlant Collaborative
  • 4.
  • 5. Problem 1: Data Volume • Cost of analysis follows Moore's Law: – 1 Student with 1 computer to analyze 1 Mb of data produced in 2001 – 200 Students and 200 computers to analyze all data produced for the same cost today (10 Gb)
  • 6. Problem 2: Fragmented Analytical Landscape 1. Tools separated by compute platform, data format, integration issues, and programming model. 1. Mixture of desktop, command line, database, and web-based tools 2. Labor intensive, fragile solutions devised to reach scientific objectives 3. Little ability to share results, analytical methods 4. Lack of reproducibility
  • 7. Scalability ABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC
  • 8.
  • 9.
  • 10. Major Ways to Access iPlant • Storing and sharing data large and small: iPlant Data Storage • Integrated web-based analysis: The Discovery Environment • Cloud computing: Atmosphere • Applications: TNRS, TreeViewer, PhytoBisque, etc • Scientific networking, knowledgebase and information exchange: My-Plant.org • Educational tools: DNASubway • Embedding iPlant CI capabilities into software: The Foundation API • High Performance Computing for experts: TeraGrid/XSEDE 10
  • 11. Why is the tree of life important? “Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”
  • 12. Nothing in biology makes sense except in the light of evolution. T. G. Dobzahnsky
  • 13.
  • 14. C3 to C4 Photosynthesis Xin-Guang et al. 2008
  • 15. "We combined geospatial and molecular sequence data from two public archives to produce a 1,230-taxon phylogeny of the grasses with accompanying climate data for all species, extracted from more than 1.1 million herbarium specimens." Edwards and Smith, 2010
  • 16. "Here we show that grasses are ancestrally a warm-adapted clade and that C4 evolution was not correlated with shifts between temperate and tropical biomes. Instead, 18 of 20 inferred C4 origins were correlated with marked reductions in mean annual precipitation."
  • 17. New Possibilities Acer glabrum Acer saccharinum Acer rubrum Acer distylum Acer macrophyllum Acer nipponicum Acer spicatum Acer carpinifolium Acer diabolicum Acer circinatum Acer sieboldianum Acer palmatum Acer saccharum Acer tschonoskii Acer rufinerve Acer pensylvanicum Acer crataegifolium Acer mono illumina Genome Analyzer, Ranger Cluster at TACC, Acer phylogeny (Ackerly 2009), Green Plant ToL
  • 18.
  • 21.
  • 22.
  • 23.
  • 24. iPlant's APIs – The Foundation API Service Role Endpoint IO File storage, retrieval and management. Database interoperability DATA File format conversion APPS Registration and discovery of HPC applications JOB Submission and management of compute jobs SYSTEMS Availability and info about XSEDE hosts PROFILE User profile discovery AUTH Token based secure authentication POSTIT URL shortener
  • 26.
  • 27.
  • 28. iPlant Data Store Dramatization: Not the actual iPlant Data Store
  • 29. Overview of the iPlant Data Store Some important items we won’t see in the demo Source Destination Copy Method Time (seconds) CD My Computer cp 320 Berkeley Server My Computer scp 150 External Drive My Computer cp 36 USB2.0 Flash My Computer cp 30 iDS MyComputer iget 18 My Computer My Computer cp 15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s 1 GB / 17.5 seconds
  • 30.
  • 31. Tree Visualization • > 500K Taxa • Fast • Web based, platform independent • Semantic zooming • Metadata driven display of information
  • 34. Obstacles Lobelia_kauaensis Lobelia_villosa Lobelia_gloria-montis Trematolobelia_kauaiensis Trematolobelia_macrostachys Lobelia_hypoleuca Lobelia_yuccoides Lobelia_niihauensis Brighamia_insignis Brighamia_rockii Delissea_rhytidosperma Delissea_subcordata Cyanea_pilosa Cyanea_acuminata Cyanea_hirtella Cyanea_coriacea Cyanea_leptostegia Clermontia_kakeana Clermontia_parviflora Clermontia_arborescens Clermontia_fauriei Number of taxa Taxa names
  • 35. Taxonomic uncertainty 1. Non-existent names • Misspellings • Contamination • Annotations • Morphospecies • Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) 2. Synonymy • Nomenclatural synonyms • Taxonomic synonyms / concepts 3. Misidentifications, incomplete identifications
  • 36. a) Centaurium curvistamineum (Wittr.) Abrams (1951) b) Centaurium minimum (Howell) Piper (1915) c) Centaurium muhlenbergii (Griseb.) Wight ex Piper (1906) d) Centaurium muhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937) e) Centaurium muhlenbergii (Griseb.) Wight ex Piper var. albiflorum Suksd. (1927) f) Centaurodes muhlenbergii (Griseb.) Kuntze (1891) g) Erythraea curvistaminea Wittr. (1886) h) Erythraea minima Howell (1901) i) Erythraea muhlenbergii Griseb. (1839) Image: Gordon Leppig & Andrea J. Pickart
  • 37.
  • 38.
  • 39. Request Tool Installation Apps -> Create -> New App Create New -> Request Tool Installation Fill out forms and submit. Receive response in 2-5 days.

Notas del editor

  1. Bringing a culture of computing to the Plant Sciences.
  2. The state of the art today. On the left are icons representing SOME of the ways we work with data.Tools are separated from one another by compute platform, data format, integration issues, programming model.Often a mixture of desktop, command line, database, and web-page based analysesLabor intensive, fragile solutions devised to reach scientific objectivesLittle ability to share results, analytical methods, or to work collaborativelyWe can INVERT the language of the COMPLAINTS to form DESIGN PRINCIPLES.Going to focus on a couple of NGS cases in my talk
  3. Left tree: Maple tree phylogeny from D. AckerlyLeft picture: Joe Felsenstein, ca. 1980Right picture: Ranger cluster at TACC
  4. Our understanding of the phylogeny of the half million known species of green plants has expanded dramatically over the past two decades, The task of assembling a comprehensive "tree of life" for them presents a Grand Challenge.Also part of the grand challenge is developing the necessary infrastructre to view and use the tree of life, to put it into the hands of plant biologists
  5. Public archives:MAT = Mean Annual TemperatureStephen Smith. iPlant supported postdoc. Now Assistant professor at the U MichiganPublished in PNAS last year
  6. Left tree: Maple tree phylogeny from D. AckerlyLeft picture: Joe Felsenstein, ca. 1980Right picture: Ranger cluster at TACCNew sequencing technologies – Computational Power and Simplified access to computational resources allow us to move from local to global scale. Climate change, nutrition global scale.
  7. Highest level of abstraction. Exactly like we can embed recent tweets in our web page, portal builders can add tools and services to their portals. E.g. BioExtract and CIPRES
  8. From the Apps catalog in the DE, select Create -> New AppOpens the Tool Integration interfaceSelect: Create New -> Request Tool InstallationFill out the form and submit it.It takes 2-5 business days to deploy the tool.