SlideShare una empresa de Scribd logo
1 de 55
Virtual Sciencein the Cloud Roy Williams California Institute of Technology
humans clouds sensors beginner to expert sharing logins and access click to code to workflow personal storage big data and replication compute and scaling software as component interoperabilty survey and event control or autonomous The New Science
Compute Services Registry Getting Data
Service Oriented Architecture 3. bind service request request client response response 2. find service contract registry 1. publish Principle: Click or Code
VO Data Services Cone Search radius+position list of objects  encoded as VOTable Simple Image Access Protocol Simple Spectrum Access Protocol spectra have subtleties  protocol more complicated Astronomical Data Query Language For database queries Core SQL functions plus astronomy-specific extensions Sky region, Xmatch Table Access Protocol Exposes relational databases What tables What table schema Here is a query in ADQL
VO Compute Services Asynchronous May not get immediate answer just get a place to check back Security Expensive resources, big requests, sequestered data Strong or Weak or None Scalable Graduated path to powerful computation and big data Cloud store VOSpace Sharable
VO Registry publish -- find -- bind Registry Metadata Descriptions of  data collections  data delivery services organizations, etc. Based on Dublin Core with astronomy-specific extensions Represented as XML schema; extensible Contents stored in Resource Registries  exchange metadata records through the Open Archives Initiative Protocol (OAI-PMH)
Distributed Registry Astrogrid CfA NCSA CDS ESO STScI/JHU NOAO Caltech HEASARC JapanVO Ongoing harvesting March 07 (CfA, ESO, NOAO soon)
Semantics & Search Identifiers  ivo://nasa.gsfc.gcn/SWIFT#BAT_GRB_Pos_374875-722 Free tags  beard Fred pudding  Controlled Vocab (UCD) phot.flux;em.ir Controlled Vocabinterop (SKOS) Ontology   Greek isA Man, Socrates isA Greek  Socrates isA Man Data Models   Each sky position will have a circular positional error estimate ... Text markup  Outflows from <object>NGC 666</object> are irregular ... Schema  Columns are Magnitude, Position, Identifier , ... Metadata (registry) forms  Full Registry: true; ManagedAuthorities: authority, nasa.heasarc Formal service description
Cloud Based Tools code & presentation data
Open SkyQuery.netVO Astronomical Crossmatch Service ,[object Object]
 Presentation,[object Object]
 Query execution
 Workflow,[object Object]
Skyalert Push-based workflow Can be cyclic Portfolio aggregation by citation Annotation as software components Stream owner builds template Django, Python, Jquery now 4 developers via SVN
Skyalert Stream Registry... will be VO registry
Roles human or robot1. browse query, human computing, WWT/Google skyalert.org human or robot2. subscribe human or robot3. author 4. annotate contrib software components archive, mining push inject web portfolios db IM/tweet/email/TCP triggers actions
skyalert.org Cyclic workflow graph Trigger CRTS[“Geometry”][“Moon angle”] > 30 and SDSS[“Photoprimary”][“g-magnitude”] < 18 Action annotator followup request dynamically loads module run(triggerEvent, portfolio):   <business logic> can build event and inject recursively send message Alerts and event cascade 18
Skyalert-LSST skyalert.org ,[object Object]
Data service from CRTS and Skyalert
 gets JSON event list via http
LSST building skyalert clone
 Pasadena and Tucson both get events by Jabber/XMPP
 “Unknown” is now choice ofCataclysmic Variable, Supernova, Blazar Outburst, Active Galactic Nucleus Variability, UVCeti Variable, Asteroid, Variable, Mira Variable, High Proper Motion Star, Comet, Eclipsing Variable, Gamma Ray Burst Afterglow, Microlensing, Nova, Planetary Microlensing, RRLyrae Variable, Tidal Disruption Flare
Tier1 and Tier2 Event NodesEvolving in IVOA Brokering Registry: Tier1 ,[object Object]
 Event ServersTier2 Authoring Distribution Jabber/XMPP or raw socket Tier1:  Immediate Forwarding, Reliable?, Topology? Tier2: Subscription, Repository, Query, Portfolio, Registry, Machine Learning, Substreams etc etc
NSF Teragrid ,[object Object]
 11 Resource Provider sites, >2 Petaflop HPC & >27000 CPUs, >3 Petabyte disk, >60 PB tape
 Fast network, Visualization, experiments (VMs, GPUs, FPGAs)
 For US researchers and their collaborators through national peer-review process,[object Object]
Architectures 2010 Science Gateway (no architecture!) Node farm (condor) Parallel computing Message-passing MPI Shared memory Graphics Processing Units 104 independent tiny threads Data Intensive Flash memory (TG/UCSD) Graywulf (JHU/Pannstarrs) Immediate resources
Science Gateways Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway, ANL BIRN Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Arroyo Adaptive Optics Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) Slide courtesy of Nancy Wilkins-Diehr
GPU for molecular modelling
Pannstarrs PS1 compute User facing SQL/casjobs workbench privacy/share stored queries Data valet load/validate merge crawl replicate log workflow workflow data head/slice hot/warm/cold Fault tolerance: multiple replication, fault workflow Cost and energy carefully considered Future: Hadoop/Mapreduce
Cloud Supercomputing? Teragrid/Globusvs   Cloud/Amazon MI Both ways to get wholesale computing Both provide IaaS, Infrastructure as a Service Virtual Machine more popular than CTSS stack What about parallelism? I/O speed? GPUs? etc Watch 3leaf and ScaleMP for these
Science and Web 2.0  Easy for groups to form and collaborate Integrates with user workspace iGoogle and OpenSocial alongside other aspects of their lives Use existing tools SlideShare, blogs, google gadgets, facebook, Gwave, Flickr, YouTube Sharing workspace Electronic log Provenance Virtual Data as “equivalent script”
Science and Web 2.0 Server delivers only code Browser makes presentation Ajax and Ajaj and Http “long poll” Jquery and Google toolkit see WWT and GSky in Skyalert “Everything is a wiki” or a wave? Visible/editable by group/s
Adaptive Optics Gateway ,[object Object]
 30-meter telescope
 Planet finding coronograph
 4-day run for 4-sec!
 Parallel  parameter sweepsproposed upgrade of the Palomar AO system to a 56x56 subaperture system
Arroyo
Arroyo Gateway Architecture 1. use HTML/JS from webserver to create job definition. wholesale computing 2. Daemon is polling & sees new job, makes local space for it. 3. Start job on compute resource & update jpb status. daemon 7. User fetches results from webserver 4. Fetch &update status of running job. Repeat. 5. Output to remote space. webserver Django MySQL job definitions and status 5. Daemon copies output from remote to local,  updates job status. local space for results remote space for results retail wholesale RW and J. Bunn
Pegasus workflow E. Deelman
E. Deelman, G. Berriman, RW, et al
LIGO Grid ,[object Object]
 now 45,000 jobs per month
 Pegasus for load balancing?,[object Object]
 Detailed progress reports during run
 Strong/weak security model with certificates,[object Object]
Wide-area Mosaicking 158 feet Griffith Observatory, Los Angeles
Citizen Science
Human Volunteers Science Layer Describe what you see in image Each person has level of expertise How to use results most effectively Galaxyzoo.org, citizensky.org good models Game Layer Makes people come back Top 10 ranking etc Anonymous partner a la gwap.com

Más contenido relacionado

La actualidad más candente

Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationJose Enrique Ruiz
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsJose Enrique Ruiz
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-finalmarpierc
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010Ian Foster
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceIan Foster
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Ian Foster
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsDatabricks
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
CHASE-CI: A Distributed Big Data Machine Learning Platform
CHASE-CI: A Distributed Big Data Machine Learning PlatformCHASE-CI: A Distributed Big Data Machine Learning Platform
CHASE-CI: A Distributed Big Data Machine Learning PlatformLarry Smarr
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
Big Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeBig Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeLiana Ye
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 

La actualidad más candente (20)

Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital Experiments
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Velocity cubes of galaxies
Velocity cubes of galaxiesVelocity cubes of galaxies
Velocity cubes of galaxies
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
CHASE-CI: A Distributed Big Data Machine Learning Platform
CHASE-CI: A Distributed Big Data Machine Learning PlatformCHASE-CI: A Distributed Big Data Machine Learning Platform
CHASE-CI: A Distributed Big Data Machine Learning Platform
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
Big Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No CodeBig Data Modeling Challenges and Machine Learning with No Code
Big Data Modeling Challenges and Machine Learning with No Code
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 

Similar a Virtual Science in the Cloud

Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discoverydiannepatricia
 
OGCE TeraGrid 2010 Science Gateway Tutorial Intro
OGCE TeraGrid 2010 Science Gateway Tutorial IntroOGCE TeraGrid 2010 Science Gateway Tutorial Intro
OGCE TeraGrid 2010 Science Gateway Tutorial Intromarpierc
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Indiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway SupportIndiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway Supportmarpierc
 
Godiva2 Overview
Godiva2 OverviewGodiva2 Overview
Godiva2 Overviewjonblower
 
WPS Application Patterns
WPS Application PatternsWPS Application Patterns
WPS Application PatternsDaniel Nüst
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The BoxIan Foster
 
Scientific
Scientific Scientific
Scientific marpierc
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World FosterIan Foster
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 
Azure: Lessons From The Field
Azure: Lessons From The FieldAzure: Lessons From The Field
Azure: Lessons From The FieldRob Gillen
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Timothy Chen
 
OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009marpierc
 
060314 Ispra Htap Presentations Husar 060314 Ispra
060314 Ispra Htap Presentations Husar 060314 Ispra060314 Ispra Htap Presentations Husar 060314 Ispra
060314 Ispra Htap Presentations Husar 060314 IspraRudolf Husar
 
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...Rudolf Husar
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 

Similar a Virtual Science in the Cloud (20)

Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discovery
 
OGCE TeraGrid 2010 Science Gateway Tutorial Intro
OGCE TeraGrid 2010 Science Gateway Tutorial IntroOGCE TeraGrid 2010 Science Gateway Tutorial Intro
OGCE TeraGrid 2010 Science Gateway Tutorial Intro
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Indiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway SupportIndiana University's Advanced Science Gateway Support
Indiana University's Advanced Science Gateway Support
 
Godiva2 Overview
Godiva2 OverviewGodiva2 Overview
Godiva2 Overview
 
WPS Application Patterns
WPS Application PatternsWPS Application Patterns
WPS Application Patterns
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
 
Scientific
Scientific Scientific
Scientific
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Azure: Lessons From The Field
Azure: Lessons From The FieldAzure: Lessons From The Field
Azure: Lessons From The Field
 
grid mining
grid mininggrid mining
grid mining
 
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
 
OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009
 
060314 Ispra Htap Presentations Husar 060314 Ispra
060314 Ispra Htap Presentations Husar 060314 Ispra060314 Ispra Htap Presentations Husar 060314 Ispra
060314 Ispra Htap Presentations Husar 060314 Ispra
 
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Virtual Science in the Cloud

  • 1. Virtual Sciencein the Cloud Roy Williams California Institute of Technology
  • 2. humans clouds sensors beginner to expert sharing logins and access click to code to workflow personal storage big data and replication compute and scaling software as component interoperabilty survey and event control or autonomous The New Science
  • 4. Service Oriented Architecture 3. bind service request request client response response 2. find service contract registry 1. publish Principle: Click or Code
  • 5. VO Data Services Cone Search radius+position list of objects encoded as VOTable Simple Image Access Protocol Simple Spectrum Access Protocol spectra have subtleties  protocol more complicated Astronomical Data Query Language For database queries Core SQL functions plus astronomy-specific extensions Sky region, Xmatch Table Access Protocol Exposes relational databases What tables What table schema Here is a query in ADQL
  • 6. VO Compute Services Asynchronous May not get immediate answer just get a place to check back Security Expensive resources, big requests, sequestered data Strong or Weak or None Scalable Graduated path to powerful computation and big data Cloud store VOSpace Sharable
  • 7. VO Registry publish -- find -- bind Registry Metadata Descriptions of data collections data delivery services organizations, etc. Based on Dublin Core with astronomy-specific extensions Represented as XML schema; extensible Contents stored in Resource Registries exchange metadata records through the Open Archives Initiative Protocol (OAI-PMH)
  • 8. Distributed Registry Astrogrid CfA NCSA CDS ESO STScI/JHU NOAO Caltech HEASARC JapanVO Ongoing harvesting March 07 (CfA, ESO, NOAO soon)
  • 9. Semantics & Search Identifiers ivo://nasa.gsfc.gcn/SWIFT#BAT_GRB_Pos_374875-722 Free tags beard Fred pudding Controlled Vocab (UCD) phot.flux;em.ir Controlled Vocabinterop (SKOS) Ontology Greek isA Man, Socrates isA Greek  Socrates isA Man Data Models Each sky position will have a circular positional error estimate ... Text markup Outflows from <object>NGC 666</object> are irregular ... Schema Columns are Magnitude, Position, Identifier , ... Metadata (registry) forms Full Registry: true; ManagedAuthorities: authority, nasa.heasarc Formal service description
  • 10. Cloud Based Tools code & presentation data
  • 11.
  • 12.
  • 13.
  • 15.
  • 16. Skyalert Push-based workflow Can be cyclic Portfolio aggregation by citation Annotation as software components Stream owner builds template Django, Python, Jquery now 4 developers via SVN
  • 17. Skyalert Stream Registry... will be VO registry
  • 18. Roles human or robot1. browse query, human computing, WWT/Google skyalert.org human or robot2. subscribe human or robot3. author 4. annotate contrib software components archive, mining push inject web portfolios db IM/tweet/email/TCP triggers actions
  • 19. skyalert.org Cyclic workflow graph Trigger CRTS[“Geometry”][“Moon angle”] > 30 and SDSS[“Photoprimary”][“g-magnitude”] < 18 Action annotator followup request dynamically loads module run(triggerEvent, portfolio): <business logic> can build event and inject recursively send message Alerts and event cascade 18
  • 20.
  • 21. Data service from CRTS and Skyalert
  • 22. gets JSON event list via http
  • 24. Pasadena and Tucson both get events by Jabber/XMPP
  • 25. “Unknown” is now choice ofCataclysmic Variable, Supernova, Blazar Outburst, Active Galactic Nucleus Variability, UVCeti Variable, Asteroid, Variable, Mira Variable, High Proper Motion Star, Comet, Eclipsing Variable, Gamma Ray Burst Afterglow, Microlensing, Nova, Planetary Microlensing, RRLyrae Variable, Tidal Disruption Flare
  • 26.
  • 27. Event ServersTier2 Authoring Distribution Jabber/XMPP or raw socket Tier1: Immediate Forwarding, Reliable?, Topology? Tier2: Subscription, Repository, Query, Portfolio, Registry, Machine Learning, Substreams etc etc
  • 28.
  • 29. 11 Resource Provider sites, >2 Petaflop HPC & >27000 CPUs, >3 Petabyte disk, >60 PB tape
  • 30. Fast network, Visualization, experiments (VMs, GPUs, FPGAs)
  • 31.
  • 32. Architectures 2010 Science Gateway (no architecture!) Node farm (condor) Parallel computing Message-passing MPI Shared memory Graphics Processing Units 104 independent tiny threads Data Intensive Flash memory (TG/UCSD) Graywulf (JHU/Pannstarrs) Immediate resources
  • 33. Science Gateways Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway, ANL BIRN Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Arroyo Adaptive Optics Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) Slide courtesy of Nancy Wilkins-Diehr
  • 34. GPU for molecular modelling
  • 35. Pannstarrs PS1 compute User facing SQL/casjobs workbench privacy/share stored queries Data valet load/validate merge crawl replicate log workflow workflow data head/slice hot/warm/cold Fault tolerance: multiple replication, fault workflow Cost and energy carefully considered Future: Hadoop/Mapreduce
  • 36. Cloud Supercomputing? Teragrid/Globusvs Cloud/Amazon MI Both ways to get wholesale computing Both provide IaaS, Infrastructure as a Service Virtual Machine more popular than CTSS stack What about parallelism? I/O speed? GPUs? etc Watch 3leaf and ScaleMP for these
  • 37. Science and Web 2.0 Easy for groups to form and collaborate Integrates with user workspace iGoogle and OpenSocial alongside other aspects of their lives Use existing tools SlideShare, blogs, google gadgets, facebook, Gwave, Flickr, YouTube Sharing workspace Electronic log Provenance Virtual Data as “equivalent script”
  • 38. Science and Web 2.0 Server delivers only code Browser makes presentation Ajax and Ajaj and Http “long poll” Jquery and Google toolkit see WWT and GSky in Skyalert “Everything is a wiki” or a wave? Visible/editable by group/s
  • 39.
  • 41. Planet finding coronograph
  • 42. 4-day run for 4-sec!
  • 43. Parallel  parameter sweepsproposed upgrade of the Palomar AO system to a 56x56 subaperture system
  • 45. Arroyo Gateway Architecture 1. use HTML/JS from webserver to create job definition. wholesale computing 2. Daemon is polling & sees new job, makes local space for it. 3. Start job on compute resource & update jpb status. daemon 7. User fetches results from webserver 4. Fetch &update status of running job. Repeat. 5. Output to remote space. webserver Django MySQL job definitions and status 5. Daemon copies output from remote to local, updates job status. local space for results remote space for results retail wholesale RW and J. Bunn
  • 47. E. Deelman, G. Berriman, RW, et al
  • 48.
  • 49. now 45,000 jobs per month
  • 50.
  • 51. Detailed progress reports during run
  • 52.
  • 53. Wide-area Mosaicking 158 feet Griffith Observatory, Los Angeles
  • 55. Human Volunteers Science Layer Describe what you see in image Each person has level of expertise How to use results most effectively Galaxyzoo.org, citizensky.org good models Game Layer Makes people come back Top 10 ranking etc Anonymous partner a la gwap.com
  • 56. Human Volunteer Evidence Donalek et al arXiv:0810.4945 [astro-ph] 4 of 10 say artifact artifact
  • 57. RW and C. Donalek
  • 60. Classic Machine LearningMetric in “Feature Space” Relevance Vector Machine (Tipping) Feature Vectors Learning from Training set Picking relevant lessons RW and J. Beck
  • 61.
  • 66.
  • 68. User Interface (wrong) and now do some science.... Finally get some help Ask for help Translate VOTable format Learn to use VO Registry Read about web services Read about XML Wait for account Register
  • 69. User interface (right) in Darwinian evolution every small change must give benefit Power user Learn the VO structure hey this is interesting .... Run bigger job more science.... Register some science.... Web form Anonymous be careful with complex authentication!
  • 70. Steering the Ship Short term Pragmatism useful tools now simple protocols (eg cone search) “just use RA and Dec” vs Long term Architecture modular suite of interoperable tools sophisticated protocols (egskynode) sophisticated Space-Time coordinates
  • 71.
  • 85. InterfacesA Data Model is a bridge from community to computers
  • 86. What is a Data Center? machines services doesn’t matter where or how testing testing testing do we have enough power and HVAC?
  • 87. Complex scienceComplex machines Separate science user from complexity Must have domain science context Making simple things simple but Power to scale up Drill-down if wanted Machines are not the objective Science through data, compute, sharing
  • 88. eScience is for People, right? Getting Started Help Desk Forum Documentation Knowledge Base Calendar Contact Us Social Media Blog/newsfeed Campus Champions Summer Schools Advanced Support for Developers Education