These slides are from my Ph.D. defense at the University of California, Santa Barbara, discussing how we contribute research tools to forward how science is performed with cloud systems.
Powerpoint exploring the locations used in television show Time Clash
Ph.D. Defense
1. Automated Configuration and
Deployment of Applications in
Heterogeneous Cloud Environments
Chris Bunch
Ph.D. Defense
November 30, 2012
2. Public Cloud
Computing
• Utility-oriented approach to computing
• Pay for only resources that you use
• Rent resources from large datacenters
maintained by Amazon, Microsoft, Google
• Don’t maintain a rack in your office - just
use somebody else’s rack
3. Using the Cloud for
Apps
• Cloud services have seen uptake in:
• Web services domain
• High performance computing
• General-purpose applications
4. Challenges in Cloud
Adoption
• Primary barriers to entry:
• Wide array of services
• Varying cost models
• Many technologies providing APIs
5. Plethora of Services
• Storage Services
• Queue Services
• Compute Services
• Fully Managed Software Stacks
• Web services only
• MapReduce only
6. Varying Cost Models
• Unlimited usage per-hour (EC2)
• Unlimited usage per-wall-clock-hour
(Azure)
• First 15-minutes, then charge per-minute
(App Engine)
• Meter per API call (SQS, App Engine)
7. Accessing Services via
APIs
• Need an API to connect your application to
the cloud service
• First-party native libraries, per-language
• Typically only for “popular” languages
• Cross-language serialization services
• Convert from your language to “popular”
language
8. Thesis Question
• How can we enable applications to be
executed on cloud systems, by
automatically configuring and deploying
applications across cloud offerings that vary
based on the type of service offered, cost
model employed, and APIs via which
services are exposed?
9. Our Solution
• Provide research tools to execute
computationally intensive applications
• Automatically configure and deploy
applications for use with cloud services
• Programming language support, to
facilitate expressive workflows
10. Design Space
Language / Platform
Domain
Support
AppScale (IEEE
Web Services
CLOUD10)
Neptune
High Performance
(ScienceCloud10,
Computing
DataCloud12)
MEDEA (IPDPS13)*,
General Purpose
Exodus (CCGrid13)*
11. Design Space
Language / Platform
Domain
Support
AppScale (IEEE
Web Services
CLOUD10)
Neptune
High Performance
(ScienceCloud10,
Computing
DataCloud12)
MEDEA (IPDPS13)*,
General Purpose
Exodus (CCGrid13)*
13. PaaS for Science
• Need a cloud that is extensible to:
• Services from competing cloud vendors
• Differing cost models from each cloud
• Varying APIs offered by cloud vendors
• And it must be open source!
14. Introducing AppScale
• An open source implementation of the
Google App Engine APIs
• Deploys over Amazon EC2 or Eucalyptus
• Configures and deploys
automatically
• User only needs to specify the number of
nodes to run over
16. Limitations
• “Recipes” are statically defined
• Limited to three-tier web applications
• Runtime environment is restricted to
enable autoscaling
• Not cost-aware
17. Design Space
Language / Platform
Domain
Support
AppScale (IEEE
Web Services
CLOUD10)
Neptune
High Performance
(ScienceCloud10,
Computing
DataCloud12)
MEDEA (IPDPS13)*,
General Purpose
Exodus (CCGrid13)*
18. HPC in the Cloud
• Easy access to vast resources
• Hard to automatically configure and deploy
libraries
• Requires in-depth knowledge of each
technology required
• Hard to get performance on opaque cloud
• Wide range of APIs for similar services
19. Introducing Neptune
• A domain specific language for running
HPC applications
• Supports MPI, UPC, X10 programs
• Configures and deploys
automatically
• Scientists need only specify the number of
nodes to execute over
22. Automated Application
Execution
• Calls to neptune() are translated into
SOAP messages, dispatched to AppScale
• AppScale pulls in library support that
details how to run each type of job
• Acquires nodes, runs job, saves output
• Cost awareness for VMs
23. Limitations
• “Recipes” for each framework are static
• Must be pre-defined by an expert user
• Software must be pre-installed on VMs
• Metadata not easily accessible
• Limited by underlying hardware
24. Design Space
Language / Platform
Domain
Support
AppScale (IEEE
Web Services
CLOUD10)
Neptune
High Performance
(ScienceCloud10,
Computing
DataCloud12)
MEDEA (IPDPS13)*,
General Purpose
Exodus (CCGrid13)*
25. Problem Domain
• Easy access to vast resources
• Hard to automatically configure and deploy
• Hard to evaluate services b/c of:
• The abstractions they expose
• The cost model they charge with
• Varying APIs for each language
26. Introducing MEDEA
• Extends Neptune to provide an execution
model for applications
• Abstract away compute, storage, queue
services via a common interface
• Automatically manage cost for the user
• Automatically connect competing APIs
27. High-Level Design
• Scripting language support
• Maximizes flexibility and interoperability
with other code
• Deployment engine (PaaS layer)
• Automatically configure and deploy
applications over cloud services
29. Scripting Language
Support
• Extends the Neptune DSL
• Adds a function call, medea()
• Users specify code, inputs, services to use
• (M)essages the Deployment Service with
this data, called a “task”
35. (A) Storage services
• Task Workers store the following outputs:
• Standard output of job
• Standard error of job
• Metadata
• User’s script (A)ccesses result of job
• Supports S3, App Engine, Azure, and AppScale
datastores (HBase, Cassandra, etc.)
36. Use Cases
• Execute scientific apps and share the
results
• Execute quickly (but expensively)
• Execute inexpensively (but slowly)
• Community cloud for benchmarking
programming language performance
37. Scientific Use Cases
• Computational systems biology application
• Simulates conditions found in yeast
• Written in Python, Java
• Deploy to EC2, App Engine, Azure
• All values are the average of five runs
40. Polyglot Science
• Implementations of the n-body application
in eleven programming languages
• Execute with Amazon EC2, SQS, and S3
• Measure time taken to execute, cost
• All values are the average of ten runs
42. n-body in Amazon EC2
Language Per-Second Cost
C $0.0069
Java $0.0075
Python $0.5876
Ruby $2.1944
Scala $0.0075
43. n-body across clouds
Cloud Cost To Execute
Amazon EC2 $0.32
App Engine (Java) $0.0013
App Engine $0.0049
(Python)
Microsoft Azure $0.02
44. Related Work
• Pegasus / Swift (WORKS ’11)
• YCSB (SOCC ‘10),YCSB++ (SOCC ‘11)
• Elastisizer (SOCC ’11)
• Condor / StratUm (BIOINFORMATICS ‘12)
• AME (WORKS ’11)
• Google App Engine Pipeline API
45. Review
• MEDEA automatically configures and
deploys applications, over multiple clouds
• Abstracts away cloud compute, storage,
and queue services from the user
• Extensible to support other clouds
• Programming language support to
enable Turing-complete workflows
46. Limitations
• Does not intelligently schedule
• Many different hardware profiles offered by
compute services
• Hard to use them effectively b/c of:
• Opaque pricing models
• Lack of Cost APIs
47. Introducing Exodus
• An Application Programming Interface (API)
• Determines how to “optimally” execute
tasks, when “optimal” means:
• Minimizing cost
• Minimizing total execution time
• User-defined functions
49. API Support
• Adds a Neptune function call, exodus()
• Users specify :optimize_for:
• Cost, performance, or a user’s Function
• Profiles code locally or remotely
• Estimates time and cost to use each instance
type at each number of machines
• Constructs and executes tasks via MEDEA
52. Cost-Aware Science
• Same app as evaluated with MEDEA
• Computational systems biology application
• Written in C
• Try to optimize cost, performance, or a
weighted average of the two
• All values are the average of five runs
54. Related Work
• RO-BURST (CCGrid 2012)
• Cannot schedule a priori
• Bicer, Chiu, and Agrawal (CCGrid 2012)
• Cost-aware middleware for MapReduce
• Java apps only, can budget based on time
or cost
55. Review
• Exodus automatically optimizes
application deployment over multiple
clouds
• Extensible to support evolving use cases
• Programming language support to
enable Turing-complete problem
descriptions
56. Contributions
• AppScale cloud platform
• Neptune programming language
• MEDEA extensions to Neptune
• Exodus extensions to MEDEA
• In combination
57. Impact
• Publications in peer-reviewed conferences
• Best Paper award for Neptune at HPDC’s
ScienceCloud
• All work done released as open source
• >10,000 downloads of AppScale / Neptune
58. Future Work
• Autoscaling in conjunction with IaaS
• Adaptive profiling for app execution
• Cost-aware fault tolerance
• Budgeting and deadlines for entire Exodus
programs, across invocations to exodus()
59. Thanks
• To my advisor, Chandra Krintz
• To my committee, Amr El Abbadi and John
Gilbert
• To the AppScale team, especially co-lead
Navraj Chohan
• To my family for their continued support
• To all of you for coming!