Download
1 / 44

- PowerPoint PPT Presentation


  • 285 Views
  • Uploaded on

COS326 Inside Windows Azure: The Cloud Operating System. Mark Russinovich Technical Fellow Windows Azure. Agenda. Introduction to the Cloud Windows Azure Fundamentals Fabric Controller Internals Updating a Service Host OS Upgrades Service Healing. Cloud Fundamentals.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - palmer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cos326 inside windows azure the cloud operating system l.jpg

COS326Inside Windows Azure:The Cloud Operating System

Mark Russinovich

Technical Fellow

Windows Azure


Agenda l.jpg
Agenda

  • Introduction to the Cloud

  • Windows Azure Fundamentals

  • Fabric Controller Internals

  • Updating a Service

  • Host OS Upgrades

  • Service Healing


Cloud fundamentals l.jpg
Cloud Fundamentals

  • Infrastructure as a Service (IaaS): basic compute and storage resources

    • On-demand servers

    • Amazon EC2, VMWarevCloud

  • Platform as a Service (PaaS): cloud application infrastructure

    • On-demand application-hosting environment

    • E.g. Google AppEngine, Salesforce.com, Windows Azure

  • Software as a Service (SaaS): cloud applications

    • On-demand applications

    • E.g. GMail, Microsoft Office Web Companions


The benefits of the cloud l.jpg
The Benefits of the Cloud

  • The Cloud is about cheap, on-demand capacity

Windows Azure


Windows azure l.jpg
Windows Azure

  • Windows Azure is an OS for the data center

    • Model: Treat the data center as a machine

    • Handles resource management, provisioning, and monitoring

    • Manages application lifecycle

    • Allows developers to concentrate on business logic

  • Provides shared pool of compute, disk and network

    • Virtualized storage, compute and network

    • Illusion of boundless resources

  • Provides common building blocks for distributed applications

    • Reliable queuing, simple structured storage, SQL storage

    • Application services like access control and connectivity



Agenda8 l.jpg
Agenda

  • Introduction to the Cloud

  • Windows Azure Fundamentals

  • Fabric Controller Internals

  • Deploying a Service

  • Updating a Service

  • Host OS Upgrades

  • Service Healing


Basic windows azure functionality l.jpg
Basic Windows Azure Functionality

  • Configuration and deployment:

    • Certificate management (e.g. SSL)

    • Load-balanced public endpoints

    • Internal endpoint configuration and discovery

  • Operations:

    • Remote desktop access management

    • Automated OS and runtime updates

    • Coordinated updates

  • Availability:

    • Health monitoring

    • SLA guaranteed uptime


Modeling cloud applications l.jpg
Modeling Cloud Applications

  • A cloud application is typically made up of different components

    • Front end: e.g. load-balanced stateless web servers

    • Middle worker tier: e.g. order processing, encoding

    • Backend storage: e.g. SQL tables or files

    • Multiple instances of each for scalability and availability

Front-End

Windows

Azure

Storage,SQL Azure

HTTP/HTTPS

Front-End

Middle-Tier

Load Balancer

Mark’s Cloud Application


The windows azure service model l.jpg
The Windows Azure Service Model

  • A Windows Azure application is called a “service”

    • Definition information

    • Configuration information

    • At least one “role”

  • Roles are like DLLs in the service “process”

    • Collection of code with an entry point that runs in its own virtual machine

  • There are currently three role types:

    • Web Role: IIS7 and ASP.NET in Windows Azure-supplied OS

    • Worker Role: arbitrary code in Windows Azure-supplied OS

    • VM Role: uploaded VHD with customer-supplied OS


Role contents l.jpg
Role Contents

  • Definition:

    • Role name

    • Role type

    • VM size (e.g. small, medium, etc.)

    • Network endpoints

  • Code:

    • Web/Worker Role: Hosted DLL and other executables

    • VM Role: VHD

  • Configuration:

    • Number of instances

    • Number of update and fault domains

Mark’s Service

Role: Front-End

Definition

Type: Web

VM Size: Small

Endpoints: External-1

Configuration

Instances: 2

Update Domains: 2

Fault Domains: 2

Role: Middle-Tier

Definition

Type: Worker

VM Size: Large

Endpoints: Internal-1

Configuration

Instances: 3

Update Domains: 2

Fault Domains: 2


Service model files l.jpg
Service Model Files

  • Service definition is in ServiceDefinition.csdef

  • Service configuration is in ServiceConfiguration.cscfg

  • CSPack program Zips service binaries and definition into service package file (service.cscfg)


Availability update domains l.jpg
Availability: Update Domains

Front-End-1

Middle Tier-3

Middle Tier-1

Middle Tier-2

Front-End-2

  • Purpose: Ensure service stays up while updating and Windows Azure OS updates

  • System considers update domains when upgrading a service

    • 1/Update domains = percent of service that will be offline

    • Default and max is 5, but you can override with upgradeDomainCount service definition property

  • The Windows Azure SLA is based on at least two update domains and two role instances in each role

Front-End-1

Front-End-2

Middle Tier-1

Middle Tier-2

Middle Tier-3

Update Domain 1

Update Domain 2

Update Domain 3


Availability fault domains l.jpg
Availability: Fault Domains

  • Purpose: Avoid single points of failures

    • Similar concept to update domains

    • But you don’t control the updates

  • Unit of failure based on data center topology

    • E.g. top-of-rack switch on a rack of machines

  • Windows Azure considers fault domains when allocating service roles

    • 2 fault domains per service

    • Will try and spread roles out across more

    • E.g. don’t put all roles in same rack

Front-End-1

Front-End-2

Middle Tier-1

Middle Tier-2

Middle Tier-3

Fault Domain 1

Fault Domain 2

Fault Domain 3


Deploying a service to the cloud the 10 000 foot view l.jpg
Deploying a Service to the Cloud:The 10,000 foot view

Service

  • Service package uploaded to portal

    • Windows Azure Portal Service passes service package to “Red Dog Front End” (RDFE) Azure service

    • RDFE converts service package to native “RD” version

  • RDFE sends service to Fabric Controller (FC) based on target region

  • FC stores image in repository and deploys and activates service

Portal Service

RDFE

Service

US-North Central Datacenter

FC


Agenda17 l.jpg
Agenda

  • Introduction to the Cloud

  • Windows Azure Fundamentals

  • Fabric Controller Internals

  • Deploying a Service

  • Updating a Service

  • Host OS Upgrades

  • Service Healing


The fabric controller fc l.jpg
The Fabric Controller (FC)

  • The “kernel” of the cloud operating system

    • Manages datacenter hardware

    • Manages Windows Azure services

  • Four main responsibilities:

    • Datacenter resource allocation

    • Datacenter resource provisioning

    • Service lifecycle management

    • Service health management

  • Inputs:

    • Description of the hardware and network resources it will control

    • Service model and binaries for cloud applications

Datacenter

Fabric Controller

Service

Server

Kernel

Process

Word

SQL Server

Exchange

Online

SQL Azure

Windows Kernel

Fabric Controller

Server

Datacenter


Sidebar what s with all these fabrics l.jpg
Sidebar: What’s with all these “Fabrics”?

  • The Windows Azure Fabric Controller is totally, completely, unrelated to AppFabric

  • AppFabricis a brand that encompasses:

    • Windows Server AppFabric: a set of components for building composite applications based on Windows Communication Foundation and Windows Workflow

    • Windows Azure AppFabric: Cloud services for connecting cloud and on-premise applications

      • AppFabricAccess Control Server

      • AppFabric Service Bus

      • AppFabric Cache

      • Built as Windows Azure services


Datacenter architecture l.jpg
Datacenter Architecture

Datacenter Routers

Aggregation Routers and

Load Balancers

Agg

Agg

Agg

Agg

Agg

Agg

LB

LB

LB

LB

LB

LB

LB

LB

LB

LB

LB

LB

Top of Rack

Switches

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

TOR

Racks

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

Nodes

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

PDU

Power Distribution Units



High level fc architecture l.jpg
High-Level FC Architecture

  • FC is a distributed, stateful application running on nodes (blades) spread across fault domains

    • Installed by “Utility” Fabric Controller

    • One acts as the primary and all others keep view of world in sync

    • Supports rolling upgrade, and services continue to run even if FC fails entirely

TOR

TOR

TOR

TOR

TOR

AGG

LB

LB

LB

LB

LB

FC3

FC5

FC2

FC4

FC1

FC3

Nodes

Rack


Provisioning a node l.jpg
Provisioning a Node

Fabric Controller

Image Repository

  • Power on node

  • PXE-boot Maintenance OS

  • Agent formats disk and downloads Host OS

  • Host OS boots, runs Sysprep /specialize, reboots

  • FC connects with the “Host Agent”

PXE

Server

Maintenance OS

Windows Azure

OS

Maintenance OS

Parent

OS

Role

Images

Role

Images

Role

Images

Role

Images

Node

Windows

Azure

OS

FC Host Agent

Windows Azure Hypervisor


Agenda24 l.jpg
Agenda

  • Introduction to the Cloud

  • Windows Azure Fundamentals

  • Fabric Controller Internals

  • Deploying a Service

  • Updating a Service

  • Host OS Upgrades

  • Service Healing


Service deployment steps l.jpg
Service Deployment Steps

  • Process service model files

    • Determine resource requirements

    • Create role images

  • Allocate compute and network resources

  • Prepare nodes

    • Place role images on nodes

    • Create virtual machines

    • Start virtual machines and roles

  • Configure networking

    • Dynamic IP addresses (DIPs) assigned to blades

    • Virtual IP addresses (VIPs) allocated and mapped to sets of DIPs

    • Programs load balancers to allow traffic


Service resource allocation l.jpg
Service Resource Allocation

  • Goal: allocate service components to available resources while satisfying all hard constraints

    • Scale requirement: # of instances

    • HW requirements: CPU, Memory, Storage, Net

    • Hosting environment requirements (OS, VM)

    • Fault domains

    • Update domains

  • Secondary goal: Satisfy soft constraints

    • Prefer allocations which will simplify servicing the host OS/hypervisor

    • Optimize network proximity

  • Service allocation produces the goal state for the resources assigned to the service components

    • Node and VM configuration (OS, hosting environment)

    • Images and configuration files to deploy

    • Processes to start

  • Service allocation also allocates network resources such as LB and VIPs


Example service allocation l.jpg
Example Service Allocation

Role B

Count: 2

Update Domains: 2

Fault Domains: 2

Size: Medium

Role A

Count: 3

Update Domains: 2

Fault Domains: 2

Size: Large

www.mycloudapp.net

www.mycloudapp.net

Load

Balancer

10.100.0.185

10.100.0.36

10.100.0.122

Fault Domain 1

Fault Domain 2

Fault Domain 3


Provisioning a role instance l.jpg
Provisioning a Role Instance

  • FC pushes role files and configuration information to target node host agent

  • Host agent creates three VHDs:

    • Differencing VHD for OS image (D:\)

      • Host agent injects FC guest agent into VHD for Web/Worker roles

    • Resource VHD for temporary files (C:\)

    • Role VHD for role files (first available drive letter e.g. E:\, F:\)

  • Host agent creates VM, attaches VHDs, and starts VM

  • Guest agent starts role host, which calls role entry point

    • Starts health heartbeat to and gets commands from host agent

  • Load balancer only routes to external endpoint when it responds to simple HTTP GET (LB probe)


Provisioning vm role instances l.jpg
Provisioning VM Role Instances

  • VM Role base and differencing VHD are stored in Windows Azure Storage blobs

    • Shadow versions are made when the originals are uploaded

    • VHD reads all go through a VHD caching service

  • Reads come on-demand from the cache

  • Writes go to a secondary differencing VHD

    • “Reimage” simply deletes it and reboots

Secondary Differencing VHD

VHD Caching

Service

RDFE

Shadow Differencing VHD

Base VHD

Windows Azure Blob Storage

Original Differencing VHD

Shadow Differencing VHD

Original Base VHD

Shadow Base VHD

Node


Inside a role vm l.jpg
Inside a Role VM

OS Volume

Resource Volume

Role Volume

Guest Agent

Role Host

Role Entry Point


Fabric controller security l.jpg
Fabric Controller Security

  • The VM is the security boundary upon which Windows Azure security is based

    • The host OS and FC host agent are trusted

    • The guest agent is untrusted

    • The FC host agent ensures that the VM can only access IP addresses assigned to VMs of the same service

      • Allows access to Internet addresses

  • FC uses certificates and network security to authorize access to datacenter resources


Agenda32 l.jpg
Agenda

  • Introduction to the Cloud

  • Windows Azure Fundamentals

  • Fabric Controller Internals

  • Deploying a Service

  • Updating a Service

  • Host OS Upgrades

  • Service Healing


Update types l.jpg
Update Types

Role A

UD 1

Role A

UD 2

  • There are two update types:

    • In-place

    • VIP swap

  • In-place update:

    • Supports changes to configuration or binaries, not service definition

    • Role instances upgraded one update domain at a time

    • Two modes: automatic and manual

  • VIP swap update:

    • Service definition can change, but external endpoints must remain the same

    • New version of service deployed, external VIP/DIP mapping swapped with old

  • Changes to external endpoint count require a new deployment

Role A

UD 1

Role A

UD 1

Role A

UD 1

Role A

UD 2

Role A

UD 2

Role A

UD 2

Role B

UD 1

Role B

UD 2

In-Place Update

Role B

UD 1

Role B

UD 1

Role B

UD 1

Role B

UD 2

Role B

UD 2

Role B

UD 2

LB

VIP Swap Update


In place update detail l.jpg
In-Place Update Detail

  • FC deploys updated role files and configuration to all nodes in parallel

  • Prepares new role instances:

    • FC host agent creates new role VHD

    • Attaches and mounts new role VHD

  • Stops old role instance:

    • FC instructs guest agent to stop role instance

    • Dismounts and detaches old role VHD

  • Starts new role instances:

    • Calls new role code entry point

    • Considers role instance update successful when role code reports “ready”

  • Note that resource volume is preserved updates of role instance


Agenda35 l.jpg
Agenda

  • Introduction to the Cloud

  • Windows Azure Fundamentals

  • Fabric Controller Internals

  • Deploying a Service

  • Updating a Service

  • Host OS Upgrades

  • Service Healing


Updating the host os l.jpg
Updating the Host OS

  • Initiated by the Windows Azure team

    • Typically no more than once per month

  • Goal: update all machines as quickly as possible

  • Constraint: must not violate service SLA

    • Service needs at least two update domains and role instances for SLA

    • Can’t allow more than one update domain of any service to be offline at a time

  • Note: your role instance keeps the same VM and VHDs, preserving cached data in the resource volume

  • Essentially a graph coloring problem

    • Edges exist between vertices (nodes) if the two nodes host instances of the same service role in different update domains

    • Nodes that don’t have edges between them can update in parallel


Example allocations l.jpg
Example Allocations

  • Both allocations are valid from the services point of view

    • Allocation 1 allows for 2 nodes rebooting simultaneously

    • Allocation 2 allows only one node to be down at any time

  • Host OS upgrade rollout is 2x faster with allocation 1

Service A

Role A-1

UD 1

Service A

Role A-1

UD 1

Service B

Role A-1

UD 1

Service A

Role A-1

UD 2

Service A

Role A-1

UD 2

Service B

Role A-1

UD 2

Service A

Role B-1

UD 1

Service A

Role B-1

UD 1

Service B

Role B-1

UD 1

Service B

Role B-1

UD 1

Service A

Role B-2

UD 2

Service A

Role B-2

UD 2

Service B

Role B-2

UD 2

Service B

Role B-2

UD 2

Allocation 1

Service B

Role A-1

UD 1

Service B

Role A-1

UD 2

Allocation 2


Agenda38 l.jpg
Agenda

  • Introduction to the Cloud

  • Windows Azure Fundamentals

  • Fabric Controller Internals

  • Deploying a Service

  • Updating a Service

  • Host OS Upgrades

  • Service Healing


Node and role health maintenance l.jpg
Node and Role Health Maintenance

  • FC maintains service availability by monitoring the software and hardware health

    • Based primarily on heartbeats

    • Automatically “heals” affected roles


Node health index l.jpg
Node Health Index

  • Timeouts vary depending on node state and operation

  • Based on heartbeats, which are typically 15 seconds

    • Used for status and recovery

    • Health state sampler resets the index on successful poll

    • Once the index falls below zero, FC attempts to heal node

    • For example, host agent timeout is 10 minutes

  • Worst-case reaction time is timeout interval + heartbeat interval

Healthy

Missed

Heartbeats

Health

Timeout

Missed

Heartbeat

Recovery

Initiated

Node

Health

Index

Heartbeat

Interval

Heartbeat

Timeout


Moving a role instance l.jpg
Moving a Role Instance

  • Moving a role instance is similar to a service update

  • On source node:

    • Role instances stopped

    • VMs stopped

    • Node reprovisioned

  • On destination node:

    • Same steps as initial role instance deployment

  • Warning: Resource VHD is not moved


Conclusion l.jpg
Conclusion:

  • Platform as a Service is all about reducing management and operations overhead

  • The Windows Azure Fabric Controller is the foundation for Windows Azure’s PaaS

    • Provisions machines

    • Deploys services

    • Configures hardware for services

    • Monitors service and hardware health

    • Performs service healing

  • The Fabric Controller continues to evolve


Session evaluations l.jpg

Session Evaluations

Tell us what you think, and you could win!

All evaluations submitted are automatically entered into a daily prize draw* 

Sign-in to the Schedule Builder at http://europe.msteched.com/topic/list/

* Details of prize draw rules can be obtained from the Information Desk.


Slide44 l.jpg

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


ad