Microsoft Windows Azure - Cloud Computing Hosting Environment Presentation

Under the Hood: Inside The Cloud Computing Hosting Environment ES19 Erick Smith Development Manager Microsoft Corporation Chuck Lenzmeier Architect Microsoft Corporation

Introduce the fabric controller Introduce the service model Give some insight into how it all works Describe the workings at the data center level Then zoom in to a single machine Purpose Of This Talk/Agenda

Resource allocation Machines must be chosen to host roles of the service Fault domains, update domains, resource utilization, hosting environment, etc. Procure additional hardware if necessary IP addresses must be acquired Provisioning Machines must be setup Virtual machines created Applications configured DNS setup Load balancers must be programmed Upgrades Locate appropriate machines Update the software/settings as necessary Only bring down a subset of the service at a time Maintaining service health Software faults must be handled Hardware failures will occur Logging infrastructure is provided to diagnose issues This is ongoing work…you’re never done Deploying A Service Manually

Windows Azure Fabric Controller VM Control VM VM VM WS08 Hypervisor Service Roles Control Agent Out-of-band communication – hardware control WS08 In-band communication – software control Load-balancers Node can be a VM or a physical machine Switches Highly-available Fabric Controller

Windows Azure Automation Fabric Controller “What” is needed Fabric Controller (FC) Maps declarative service specifications to available resources Manages service life cycle starting from bare metal Maintains system health and satisfies SLA What’s special about it Model-driven service management Enables utility-model shared fabric Automates hardware management Make it happen Fabric Switches Load-balancers

Owns all the data center hardware Uses the inventory to host services Similar to what a per machine operating system does with applications The FC provisions the hardware as necessary Maintains the health of the hardware Deploys applications to free resources Maintains the health of those applications Fabric Controller

Modeling Services Public Internet Template automatically maps to service model Background Process Role Front-end Web Role Load Balancer Fundamental Services Load Balancer Channel Endpoint Interface Directory Resource

The topology of your service The roles and how they are connected Attributes of the various components Operating system features required Configuration settings Describe exposed interfaces Required characteristics How many fault/update domains you need How many instances of each role What You Describe In Your Service Model…

Allows you to specify what portion of your service can be offline at a time Fault domains are based on the topology of the data center Switch failure Statistical in nature Update domains are determined by what percentage of your service you will take out at a time for an upgrade You may experience outages for both at the same time System considers fault domains when allocating service roles Example: Don’t put all roles in same rack System considers update domains when upgrading a service Fault/Update Domains Fault domains Allocation is across fault domains

Purpose: Communicate settings to service roles There is no “registry” for services Application configuration settings Declared by developer Set by deployer System configuration settings Pre-declared, same kinds for all roles Instance ID, fault domain ID, update domain ID Assigned by the system In both cases, settings accessible at run time Via call-backs when values change Dynamic Configuration Settings

Windows Azure Service LifecycleGoal is to automate life cycle as much as possible Automated Automated Developer/ Deployer Developer

Resource allocation Nodes are chosen based on constraints encoded in the service model Fault domains, update domains, resource utilization, hosting environment, etc. VIPs/LBs are reserved for each external interface described in the model Provisioning Allocated hardware is assigned a new goal state FC drives hardware into goal state Upgrades FC can upgrade a running service Maintaining service health Software faults must be handled Hardware failures will occur Logging infrastructure is provided to diagnose issues Lifecycle Of A Windows Azure Service

Primary goal – find a home for all role instances Essentially a constraint satisfaction problem Allocate instances across “fault domains” Example constraints include Only roles from a single service can be assigned to a node Only a single instance of a role can be assigned to a node Node must contain a compatible hosting environment Node must have enough resources remaining Service model allows for simple hints as to the resources the role will utilize Node must be in the correct fault domain Nodes should only be considered if healthy A machine can be sub-partitioned into VMs Performed as a transaction Resources Come From Our Shared Pool

Key FC Data Structures Logical Node Logical Role Instance Logical Role Logical Service Role Instance Description Role Description Physical Node Service Description

Maintaining Node State Logical Node Logical Role Instance Goal State Current State Physical Node

FC maintains a state machine for each node Various events cause node to move into a new state FC maintains a cache about the state it believes each node to be in State reconciled with true node state via communication with agent Goal state derived based on assigned role instances On a heartbeat event the FC tries to move the node closer to its goal state (if it isn’t already there) FC tracks when goal state is reached Certain events clear the “in goal state” flag The FC Provisions Machines…

Virtual IPs (VIPs) are allocated from a pool Load balancer (LB) setup VIPs and dedicated IP (DIP) pools are programmed automatically Dips are marked in/out of service as the FCs belief about state of role instances change LB probing is set up to communicate with agent on node which has real time info on health of role Traffic is only routed to roles ready to accept traffic Routing information is sent to agent to configure routes based on network configuration Redundant network gear is in place for high availability …And Other Data Center Resources

Windows Azure FC monitors the health of roles FC detects if a role dies A role can indicate it is unhealthy Upon learning a role is unhealthy Current state of the node is updated appropriately State machine kicks in again to drive us back into goals state Windows Azure FC monitors the health of the host If the node goes offline, FC will try to recover it If a failed node can’t be recovered, FC migrates role instances to a new node A suitable replacement location is found Existing role instances are notified of the configuration change The FC Keeps Your Service Running

FC can upgrade a running service Resources deployed to all nodes in parallel Done by updating one “update domain” at a time Update domains are logical and don’t need to be tied to a fault domain Goal state for a given node is updated when the appropriate update domain is reached Two modes of operation Manual Automatic Rollbacks are achieved with the same basic mechanism How Upgrades Are Handled

Windows Azure provisions and monitors hardware elements Compute nodes, TOR/L2 switches, LBs, access routers, and node OOB control elements Hardware life cycle management Burn-in tests, diagnostics, and repair Failed hardware taken out of pool Application of automatic diagnostics Physical replacement of failed hardware Capacity planning On-going node and network utilization measurements Proven process for bringing new hardware capacity online Behind The Scenes Work

Your services are isolated from other services Can access resources declared in model only Local node resources – temp storage Network end-points Isolation using multiple mechanisms Automatic application of windows security patches Rolling operating system image upgrades Service Isolation And Security

FC is a cluster of 5-7 replicas Replicated state with automatic failover New primary picks up seamlessly from failed replica Even if all FC replicas are down, services continue to function Rolling upgrade support of FC itself FC cluster is modeled and controlled by a utility “root” FC Windows Azure FC Is Highly Available Client Node FC Agent FC Core FC Core FC Core Object Model Object Model Object Model Primary FC Node Secondary FC Node Secondary FC Node Uncommitted Committed Committed Committed Disk Disk Disk Replication system

Network has redundancy built in Redundant switches, load balancers, and access routers Services are deployed across fault domains Load balancers route traffic to active nodes only Windows Azure FC state check-pointed periodically Can roll-back to previous checkpoints Guards against corrupted FC state, loss of all replicated state, operator errors FC state is stored on multiple replicas across fault domains Windows Azure Fabric Is Highly Available

PDC release Automated service deployment Three service templates Support for changing number of running instances Simple service upgrades/downgrades Automated service failure discovery and recovery External VIP address/DNS name per service Service network isolation enforcement Automated hardware management Include automated network load-balancer management For 2009 Ability to model more complex applications Richer service life-cycle management Richer network management Service Life-cycle

Windows Azure automates most functions System takes care of running and keeping services up Service owner in control Self-management model through portal Secure and highly-available platform Built-in data center management Capacity planning Hardware and network management Summary

Multi-tenancy with security and isolation Improved ‘performance/watt/$’ ratio Increased operations automation Hypervisor-based virtualization Highly efficient and scalable Leverages hardware advances Virtual Computing Environment

High-Level Architecture Guest OS Server Enterprise Guest OS Server Enterprise Host OS Server Core Applications Applications VirtualizationStack (VSC) VirtualizationStack (VSC) VirtualizationStack (VSP) Drivers Hypervisor GuestPartition Host Partition GuestPartition VMBUS VMBUS VMBUS Hardware CPU NIC Disk1 Disk2

Images are virtual hard disks (VHDs) Offline construction and servicing of images Separate operating system and service images Same deployment model for root partition Image-Based Deployment

Image-Based Deployment Maintenance OS Host Partition Guest Partition Guest Partition Guest Partition Application VHD Application VHD Application VHD App1 Package App3 Package App2 Package Host partition differencing VHD Guest partition differencing VHD Guest partition differencing VHD Guest partition differencing VHD HV-enabled Server Core base VHD Server Enterprise base VHD Server Core base VHD Server Enterprise base VHD

Deployment of images is just file copy No installation Background process Multicast Image caching for quick update and rollback Servicing is an offline process Dynamic allocation based on business needs Net: High availability at lower cost Rapid And Reliable Provisioning

Tech Preview offers one virtual machine type Platform: 64-bit Windows Server 2008 CPU: 1.5-1.7 GHz x64 equivalent Memory: 1.7 GB Network: 100 Mbps Transient local storage: 250 GB Windows azure storage also available: 50 GB Full service model supports more virtual machine types Expect to see more options post-PDC Windows Azure Compute Instance

Hypervisor Efficient: Exploit latest processor virtualization features (e.g., SLAT, large pages) Scalable: NUMA-aware for scalability Small: Take up little resources Host/guest operating system Window Server 2008 compatible Optimized for virtualized environment I/O performance equally shared between virtual machines Windows Azure Virtualization

Expensive SLAT requires less hypervisor intervention associated with shadow page tables (SPT) Allow more CPU cycles to be spent on real work Release memory allocated for SPT SLAT supports large page size (2MB and 1GB) Second-Level Address Translation

The system is divided into small groups of processors (NUMA nodes) Each node has dedicated memory (local) Nodes can access memory residing in other nodes (remote), but with extra latency NUMA Support

NUMA-aware for virtual machine scalability Hypervisor schedules resources to improve performance characteristics Assign “near” memory to virtual machine Select “near” logical processor for virtual processor NUMA Scalability

Scheduler Tuned for datacenter workloads (ASP.NET, etc.) More predictability and fairness Tolerate heavy I/O loads Intercept reduction Spin lock enlightenments Reduce TLB flushes VMBUS bandwidth improvement More Hypervisor Optimizations

Automated, reliable deployment Streamlined and consistent Verifiable through offline provisioning Efficient, scalable hypervisor Maximizing CPU cycles on customer applications Optimized for datacenter workload Reliable and secure virtualization Compute instances are isolated from each other Predictable and consistent behavior Summary

Related PDC sessions A Lap Around Cloud Services Architecting Services For The Cloud Cloud Computing: Programming In The Cloud Related PDC labs Windows Azure Hands-on Labs Windows Azure Lounge Web site http://www.azure.com/windows Related Content

Evals & Recordings Please fill out your evaluation for this session at: This session will be available as a recording at: www.microsoftpdc.com

Please use the microphones provided Q&A

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Stay Updated Know More about Windows Azure- http://www.microsoft.com/windowsazure/ Know more about Microsoft Cloud Services- http://www.microsoft.com/india/cloud/ Request for an Enterprise Cloud Assessment workshop- email us at azurepro@microsoft.com Follow us

Microsoft Windows Azure - Cloud Computing Hosting Environment Presentation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Similar a Microsoft Windows Azure - Cloud Computing Hosting Environment Presentation

Similar a Microsoft Windows Azure - Cloud Computing Hosting Environment Presentation (20)

Más de Microsoft Private Cloud

Más de Microsoft Private Cloud (20)

Último

Último (20)

Microsoft Windows Azure - Cloud Computing Hosting Environment Presentation