SlideShare una empresa de Scribd logo
1 de 35
So long, computer overlordsHow Cloud (and Grid) can liberate research IT – and transform discoveryIan Foster
The data deluge MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB Sloan: 40 TB Pan-STARRS: 40,000 TB 100,000 TB Genomic sequencing output x2 every 9 month >300 public centers 1330molec. bio databases Nucleic Acids Research (96 in Jan 2001) 2004: 36 TB 2012: 2,300 TB Climate model intercomparison project (CMIP) of the IPCC
Big science has achieved big successes OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010 LIGO: 1 PB data in last science run, distributed worldwide Robust production solutions Substantial teams and expense Sustained, multi-year effort Application-specific solutions,   built on common technology ESG: 1.2 PB climate data delivered to 23,000 users; 600+ pubs All build on NSF OCI (& DOE)-supported Globus Toolkit software
But small science is struggling More data, more complex data Ad-hoc solutions Inadequate software, hardware Data plan mandates
Medium-scale science struggles too! Blanco 4m on Cerro Tololo Image credit: Roger Smith/NOAO/AURA/NSF Dark Energy Survey receives 100,000 files each night in Illinois They transmit files to Texas for analysis … then move results back to Illinois Process must be reliable, routine, and efficient The cyberinfrastructure team is not large
The challenge of staying competitive    "Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”   "A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
Current approaches are unsustainable Small laboratories PI, postdoc, technician, grad students Estimate 5,000 across US university community Average ill-spent/unmet need of 0.5 FTE/lab? Medium-scale projects Multiple PIs, a few software engineers Estimate 500 across US university community Average ill-spent/unmet need of 3 FTE/project? Total 4000 FTE: at ~$100K/FTE => $400M/yr     Plus computers, storage, opportunity costs, …
And don’t forget administrative costs 42%of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research        — Federal Demonstration Partnership faculty burden survey, 2007
You can run a company from a coffee shop
Because businesses outsource their IT Web presence Email (hosted Exchange) Calendar  Telephony (hosted VOIP)  Human resources and payroll  Accounting  Customer relationship mgmt Software as a Service (SaaS)
And often their large-scale computing too Web presence Email (hosted Exchange) Calendar  Telephony (hosted VOIP)  Human resources and payroll  Accounting  Customer relationship mgmt  Data analytics  Content distribution Software as a Service (SaaS) Infrastructure as a Service(IaaS)
Let’s rethink how we provide research IT Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to provide millions of researchers with unprecedented access to powerful tools;  enable  a massive shortening of cycle times intime-consuming research processes; and reduce research IT costs dramatically via economies of scale so long,  computer overlords
Time-consuming tasks in science Run experiments Collect data Manage data Move data Acquire computers Analyze data Run simulations Compare experiment with simulation Search the literature ,[object Object]
Publish papers
Find, configure, install relevant software
Find, access, analyze relevant data
Order supplies
Write proposals
Write reports
…,[object Object]
Publish papers
Find, configure, install relevant software
Find, access, analyze relevant data
Order supplies
Write proposals
Write reports
…,[object Object]
Data movement can be surprisingly difficult                       Discover endpoints, determine available                       protocols, negotiate firewalls, configure software,                       manage space, determine required credentials,                       configure protocols, detect and respond to failures, determine expected performance, determine actual performance, identify diagnose and correct network misconfigurations, integrate with file systems, … It took 2 weeks and much help from many people to move 10 TB between California and Tennessee. (2007 BES report) B A
Globus Online’sSaaS/Web 2.0 architecture Command line interface lsalcf#dtn:/ scpalcf#dtn:/myfile br />nersc#dtn:/myfile HTTP REST interface POST https://transfer.api.globusonline.org/ v0.10/transfer <transfer-doc> Web interface (Operate)  Fire-and-forget data movement Automatic fault recovery High performance No client software install Across multiple security domains (Hosted on)  GridFTP servers FTP servers Other protocols: HTTP, WebDAV, SRM, … Globus Connect on local computers
Example application: UC sequencing facility Mac using Globus Connect Delivery of data to customer iBi File Server Mount drive iBi general-purpose compute cluster Sequencing-specific compute cluster Sequencing instrument
Statistics and user feedback Launched November 2010 >1400 users registered >350 TB user data moved >28 million user files moved >140 endpoints registered Widely used on TeraGrid/XSEDE; other centers & facilities; internationally >20x faster than SCP Faster than hand-tuned  “Last time I needed to fetch 100,000 files from NERSC, a graduate student babysat the process for a month.” “I expected to spend four weeks writing code to manage my data transfers; with Globus Online, I was up and running in five minutes.” “Globus Online’s speed has us planning experiments that we would never have considered previously.”
Moving 586 Terabytes in two weeks
Monitoring provides deep visibility

Más contenido relacionado

La actualidad más candente

Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
Ming Li
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
Chris Dwan
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
Kirill Osipov
 

La actualidad más candente (20)

Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
 
2017 bio it world
2017 bio it world2017 bio it world
2017 bio it world
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data Analysis
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 

Destacado

IT in Healthcare_2016 Research Agenda
IT in Healthcare_2016 Research AgendaIT in Healthcare_2016 Research Agenda
IT in Healthcare_2016 Research Agenda
Neal Tilbury
 
Socializing Big Data: Collaborative Opportunities in Computer Science, the So...
Socializing Big Data: Collaborative Opportunities in Computer Science, the So...Socializing Big Data: Collaborative Opportunities in Computer Science, the So...
Socializing Big Data: Collaborative Opportunities in Computer Science, the So...
Sheryl Grant
 
Banking Industry and Information Technology
Banking Industry and Information TechnologyBanking Industry and Information Technology
Banking Industry and Information Technology
Chandan Pahelwani
 

Destacado (20)

IT in Healthcare_2016 Research Agenda
IT in Healthcare_2016 Research AgendaIT in Healthcare_2016 Research Agenda
IT in Healthcare_2016 Research Agenda
 
Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...
Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...
Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
Pistoia Alliance conference April 2016: Big Data: Introduction
Pistoia Alliance conference April 2016: Big Data: IntroductionPistoia Alliance conference April 2016: Big Data: Introduction
Pistoia Alliance conference April 2016: Big Data: Introduction
 
Socializing Big Data: Collaborative Opportunities in Computer Science, the So...
Socializing Big Data: Collaborative Opportunities in Computer Science, the So...Socializing Big Data: Collaborative Opportunities in Computer Science, the So...
Socializing Big Data: Collaborative Opportunities in Computer Science, the So...
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Social media data for Social science research
Social media data for Social science researchSocial media data for Social science research
Social media data for Social science research
 
"The Human Face of Big Data", Crystal Valentine, VP of Technology Strategy at...
"The Human Face of Big Data", Crystal Valentine, VP of Technology Strategy at..."The Human Face of Big Data", Crystal Valentine, VP of Technology Strategy at...
"The Human Face of Big Data", Crystal Valentine, VP of Technology Strategy at...
 
Frans feldberg
Frans feldbergFrans feldberg
Frans feldberg
 
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
ICCES'2016  BIG DATA IN HEALTHCARE AND SOCIAL SCIENCESICCES'2016  BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
 
Banking technology
Banking technologyBanking technology
Banking technology
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Banking Industry and Information Technology
Banking Industry and Information TechnologyBanking Industry and Information Technology
Banking Industry and Information Technology
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
Thesis: THE ROLE OF INFORMATION TECHNOLOGY ON COMMERCIAL BANKS IN NIGERIA
Thesis: THE ROLE OF INFORMATION TECHNOLOGY ON COMMERCIAL BANKS IN NIGERIAThesis: THE ROLE OF INFORMATION TECHNOLOGY ON COMMERCIAL BANKS IN NIGERIA
Thesis: THE ROLE OF INFORMATION TECHNOLOGY ON COMMERCIAL BANKS IN NIGERIA
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
 
Smart Cities and Big Data - Research Presentation
Smart Cities and Big Data - Research PresentationSmart Cities and Big Data - Research Presentation
Smart Cities and Big Data - Research Presentation
 
IT in Healthcare
IT in HealthcareIT in Healthcare
IT in Healthcare
 

Similar a So Long Computer Overlords

Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 

Similar a So Long Computer Overlords (20)

eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloud
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 

Más de Ian Foster

Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 

Más de Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 

Último

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

So Long Computer Overlords

  • 1. So long, computer overlordsHow Cloud (and Grid) can liberate research IT – and transform discoveryIan Foster
  • 2.
  • 3.
  • 4. The data deluge MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB Sloan: 40 TB Pan-STARRS: 40,000 TB 100,000 TB Genomic sequencing output x2 every 9 month >300 public centers 1330molec. bio databases Nucleic Acids Research (96 in Jan 2001) 2004: 36 TB 2012: 2,300 TB Climate model intercomparison project (CMIP) of the IPCC
  • 5. Big science has achieved big successes OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010 LIGO: 1 PB data in last science run, distributed worldwide Robust production solutions Substantial teams and expense Sustained, multi-year effort Application-specific solutions, built on common technology ESG: 1.2 PB climate data delivered to 23,000 users; 600+ pubs All build on NSF OCI (& DOE)-supported Globus Toolkit software
  • 6. But small science is struggling More data, more complex data Ad-hoc solutions Inadequate software, hardware Data plan mandates
  • 7. Medium-scale science struggles too! Blanco 4m on Cerro Tololo Image credit: Roger Smith/NOAO/AURA/NSF Dark Energy Survey receives 100,000 files each night in Illinois They transmit files to Texas for analysis … then move results back to Illinois Process must be reliable, routine, and efficient The cyberinfrastructure team is not large
  • 8. The challenge of staying competitive "Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.” "A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
  • 9. Current approaches are unsustainable Small laboratories PI, postdoc, technician, grad students Estimate 5,000 across US university community Average ill-spent/unmet need of 0.5 FTE/lab? Medium-scale projects Multiple PIs, a few software engineers Estimate 500 across US university community Average ill-spent/unmet need of 3 FTE/project? Total 4000 FTE: at ~$100K/FTE => $400M/yr Plus computers, storage, opportunity costs, …
  • 10. And don’t forget administrative costs 42%of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research — Federal Demonstration Partnership faculty burden survey, 2007
  • 11. You can run a company from a coffee shop
  • 12. Because businesses outsource their IT Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Software as a Service (SaaS)
  • 13. And often their large-scale computing too Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution Software as a Service (SaaS) Infrastructure as a Service(IaaS)
  • 14. Let’s rethink how we provide research IT Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to provide millions of researchers with unprecedented access to powerful tools; enable a massive shortening of cycle times intime-consuming research processes; and reduce research IT costs dramatically via economies of scale so long, computer overlords
  • 15.
  • 17. Find, configure, install relevant software
  • 18. Find, access, analyze relevant data
  • 22.
  • 24. Find, configure, install relevant software
  • 25. Find, access, analyze relevant data
  • 29.
  • 30. Data movement can be surprisingly difficult Discover endpoints, determine available protocols, negotiate firewalls, configure software, manage space, determine required credentials, configure protocols, detect and respond to failures, determine expected performance, determine actual performance, identify diagnose and correct network misconfigurations, integrate with file systems, … It took 2 weeks and much help from many people to move 10 TB between California and Tennessee. (2007 BES report) B A
  • 31. Globus Online’sSaaS/Web 2.0 architecture Command line interface lsalcf#dtn:/ scpalcf#dtn:/myfile br />nersc#dtn:/myfile HTTP REST interface POST https://transfer.api.globusonline.org/ v0.10/transfer <transfer-doc> Web interface (Operate) Fire-and-forget data movement Automatic fault recovery High performance No client software install Across multiple security domains (Hosted on) GridFTP servers FTP servers Other protocols: HTTP, WebDAV, SRM, … Globus Connect on local computers
  • 32. Example application: UC sequencing facility Mac using Globus Connect Delivery of data to customer iBi File Server Mount drive iBi general-purpose compute cluster Sequencing-specific compute cluster Sequencing instrument
  • 33. Statistics and user feedback Launched November 2010 >1400 users registered >350 TB user data moved >28 million user files moved >140 endpoints registered Widely used on TeraGrid/XSEDE; other centers & facilities; internationally >20x faster than SCP Faster than hand-tuned “Last time I needed to fetch 100,000 files from NERSC, a graduate student babysat the process for a month.” “I expected to spend four weeks writing code to manage my data transfers; with Globus Online, I was up and running in five minutes.” “Globus Online’s speed has us planning experiments that we would never have considered previously.”
  • 34. Moving 586 Terabytes in two weeks
  • 36. 20 Terabytes in less than one day Terabyte 20 Gigabyes in more than two days Gigabyte Megabyte Kilobyte
  • 37. Common research data management steps Dark Energy Survey Galaxy genomics LIGO observatory SBGrid structural biology consortium NCAR climate data applications Land use change; economics
  • 38. We have choices of where to compute Campus systems First target for many researchers XSEDE supercomputers 220,000 cores, peer-reviewed awards Optimized for scientific computing Open Science Grid 60,000 cores; high throughput Commercial cloud providers Instant access for small tasks Expensive for big projects Users insist that they need everything connected
  • 39. Towards “research IT as a service”
  • 40. Research data management as a service GO-User Credentials and other profile information GO-Transfer Data movement GO-Team Group membership GO-Collaborate Connect to collaborative tools: Jira, Confluence, … GO-Store Access to campus, cloud, XSEDE storage GO-Catalog On-demand metadata catalogs GO-Compute Access to computers GO-Galaxy Share, create, run workflows Today Prototype Fall
  • 41. SaaS services in action: The XSEDE vision XUAS
  • 42. Data analysis as a service: Early steps Securely and reliably: Assemble code Find computers Deploy code Run program Access data Store data Record workflow Reuse workflow [7, 8] [1, 2] We have built such systems for biological, environmental,and economics researchers VM image App code Workflow Galaxy Condor [3, 4] [5, 6] Data store
  • 43. SaaS economics: A quick tutorial Lower per-user cost (x10?) via aggregation onto common infrastructure $400M/yr $40M/yr? Initial “cost trough” due to fixed costs Per-user revenue permits positive return to scale Further reduce per-user cost over time $ 0 Time X10 reduction in per-user cost: $50K  $5K/yr per lab $300K  $30K/yr per project
  • 44. A national cyberinfrastructure strategy? To providemore capability formore people at less cost … Create infrastructure Robust and universal Economies of scale Positive returns to scale Via the creative use of Aggregation (“cloud”) Federation (“grid”) Small and medium laboratories and projects P L L L L L L L L L P P P P L L L L L L L L L L L L L L L L L L aa S Research data management Collaboration, computation Research administration
  • 45. Acknowledgments Colleagues at UChicago and Argonne Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik, Michael Russell, Paul Dave, Stuart Martin, Dan Katz, and many others Colleagues at other institutions Carl Kesselman, MironLivny, John Towns, and others NSF OCI, MPS, and SBE; DOE ASCR; and NIH for support
  • 46. For more information Foster, I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011. Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswamy, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pickett, K. and Tuecke, S. Globus Online: Radical Simplification of Data Movement via SaaS. Communications of the ACM, 2011.

Notas del editor

  1. I wanted a catchy title, so I chose one that referred to the recent victory of Watson overBrad Rutter and Ken Jenningsin Jeopardy.
  2. But my point (perhaps confusingly) is not that new computer capabilities are a bad thing. On the contrary, these capabilities represent a tremendous opportunity for science.The challenge that I want to speak to is how we leverage these capabilities without computers and computation overwhelming the research community in terms of both human and financial resources.The solution, I will suggest, is to get computation out of the lab—to outsource it to third party providers. I will explain how this task can be achieved.
  3. The need to deal with and benefit from large quantities of data is not a new concept: it has been noted in many policy reports, particularly in the US and UK, over the past several years.Series of policy reports, particularly in the US and UK, about the new models of science, and investments to be madeA sampling of key reports, in chron order:Atkins report, 2003 – laid out the vision of cyberinfrastructure – which was also used as a roadmap by the UK for their eScience programNSB Long lived data report, 2005 – defined data, data scientists, and laid out capture and curation issues2020 Science – 2006, outlining the data and computational nature of scienceNSF Vision doc, 2007, consolidated the Atkins report, LL data report, others, to layout a programmatic plan. Datanet, Cyberenabled discovery and innovation came from this planRecent ACI report on data and viz.Harnessing the power – NITRD, 2009, for federal agenciesRCUK eScience reviewBlue ribbon panel on economics of curation
  4. But now the data deluge is now upon us. I use a few examples to highlight developments:-- Genome sequencing machines are doubling in output every nine months. This leaves the rather stately 18 month Moore’s Law doubling of computer performance in the shade.-- Astronomy, which only entered the digital era around 2000, projects 100,000 TB data from LSST by the end of the decade. [2MASS completed 2001; -- Simulation -- And not just volume, but also complexityTrends: Scale, complexity, distributed generation, …--------Source for genomic data: http://www.sciencemag.org/content/331/6018/728.short (“Output from next-generation sequencing (NGS) has grown from 10 Mb per day to 40 Gb per day on a single sequencer, and there are now 10 to 20 major sequencing labs worldwide that have each deployed more than 10 sequencers “)Source for mol bio dbs: http://nar.oxfordjournals.org/content/39/suppl_1/D1.full.pdf+htmlSource for climate change image: http://serc.carleton.edu/details/images/17685.html
  5. Not just small labs—medium science too.E.g., Dark Energy Survey.
  6. For many researchers, projects, and institutions, large data volumes are not an opportunity but a fundamental challenge to their competitiveness as researchers. How can they keep up?
  7. 200 universities * 250 faculty per university = 5,000Summary:-- Big projects can build sophisticated solutions to IT problems-- Small labs and collaborations have problems with both--They need solutions, not toolkits—ideally outsourced solutions
  8. Need date
  9. Of course, people also make effective use of IaaS, but only for more specialized tasks
  10. More specifically, the opportunity is to apply a very modern technology—software as a service, or SaaS—to address a very modern problem, namely the enormous challenges inherent in translating revolutionary 21st century technologies into scientific advances. Midway’s SaaS approach will address these challenges, and both make powerful tools far more widely available, and reduce the cycle time associated with research and discovery.Achieve economies of scaleReduce cost per researcher dramaticallyAchieve positive returns to scaleMost academic solutions do NOT have PRTSMost industrial solutions DO have PRTS
  11. So let’s look at that list again.I and my colleagues started an effort a little while ago aimed at applying SaaS to one of these tasks …
  12. Example: small lab generates data at Texas Advanced Computing Center or the Advanced Photon Source. Needs to move it back to their lab.Or: Needs to move data from experimental facility (e.g., sequencing center or Dark Energy Survey) to computing facility for analysis.
  13. Data movement is conceptually simple, but can be surprisingly difficult
  14. Why? Discover endpoints, determine available protocols, negotiate firewalls, configure software, manage space, determine required credentials, configure protocols, detect and respond to failures, identify diagnose and correct network misconfigurations,…
  15. •Reliable file transfer. –Easy “fire and forget” file transfers –Automatic fault recovery –High performance –Across multiple security domains•No IT required. –No client software installation –New features automatically available –Consolidated support and troubleshooting –Works with existing GridFTP servers –Globus Connect solves “last mile problem”
  16. I’ll talk about integration with the Galaxy workflow system later …
  17. Reduce costs.Improve performance.Enable new science.
  18. What else do we need?
  19. Add university logos?
  20. Slide 33: Is the task of creating reusable workflows part of these 6 steps? Is publication and discovery of workflows/derived data products part of this as well? Is reproducible research part of it as well?
  21. Researchers vote with their dollars
  22. Before-- Lots of little labs-- Big science-- XSEDE After:lots of empowered SMLs, entrepreneurship in science, reproducible/reusable research etc