slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Microsoft.com Design for Resilience The Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center PowerPoint Presentation
Download Presentation
Microsoft.com Design for Resilience The Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center

Loading in 2 Seconds...

play fullscreen
1 / 41

Microsoft.com Design for Resilience The Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

Microsoft.com Design for Resilience The Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center. Paul Wright Technology Architect Manager Microsoft.com Operations pwright @microsoft.com. Sunjeev Pandey Senior Director Microsoft.com Operations sunjeevp @microsoft.com.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Microsoft.com Design for Resilience The Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center' - mason


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Microsoft.comDesign for ResilienceThe Infrastructure of www.microsoft.com, Microsoft Update, and the Download Center

Paul Wright Technology Architect Manager

Microsoft.com Operations

pwright@microsoft.com

Sunjeev PandeySenior Director

Microsoft.com Operations

sunjeevp@microsoft.com

agenda
Agenda
  • Microsoft.com Introduction
  • Size and Scale
  • Network and System Architecture
  • How Do We Do It?
  • Questions
a brief history of microsoft com

30k users / day

4M UUsers / day

6.5M UUsers / day

17.1M UUsers / day

1995

2001

2003

2006

A Brief History Of Microsoft.com

Microsoft launches

www.microsoft.com

Information & support

publishing; hosting

Microsoft combines Web platform, ops, and content teams

Standardization effort begins, consolidation hosted systems

Focus on MSCOM Network Programming and campaign-to-Web integration

Single MSCOM group formed

Brand, content, site std’s, Privacy, brand compliance

Enable an innovative customer experience online & in-product

Product Info, Support, Dev / ITPro Experience, Customer Intelligence, Profile Mgmt & Enterprise Downloads

resiliency vs disaster recovery
Resiliency vs. Disaster Recovery

Disaster

Recovery

Resiliency

Type of

Failover

Reactive

Static

Manual

Backup/Restore

Proactive

Dynamic

Automatic

Data Mirroring

Characteristics

Pros:* Increased Availability

* Improved Performance

Cons:* Higher Initial Costs* More Complexity

5

microsoft com corporate reach
Microsoft.com Corporate Reach
  • ReachOverview – June 06
    • #6 overall site in U.S; 55.7M UU for 36% reach*
    • #4 site worldwide; reaching 248.5M UU**
    • Avg 280M UU/month July 05 to Jun 06
  • Reach Surpasses All Corporate Sites
    • Apple ranked #22: 17.8M UU, 11.5% reach
    • Netscape ranked #67: 9.6M UU, 6.2% reach
    • Sony ranked #217: 3.9M UU, 2.6% reach
    • SUN ranked #307: 3.1M UU, 2.0% reach
    • IBM ranked #485: 2.1M UU, 1.4% reach

(US data provided for relative comparison*)

*Nielsen/NetRatings June 2006 - (unique users in millions);

**Worldwide data from comScore Media Metrix June 2006 – (unique users in millions)

microsoft com quick facts
Microsoft.com – Quick Facts

Infrastructure and Application Footprint

  • 6 Internet Data Centers & 3 CDN Partnerships
  • 120+ Web Sites, 1000’s App's and 2138 Databases
  • 120+ Gigabit/sec Bandwidth

Solutions at High Scale

  • www.Microsoft.com
    • 17.1M UUsers/Day & 70M Page Views/Day
    • 10K Req/Sec, 300K CC Conn’s on 80 Servers
    • 350 Vroots, 190 IIS Web App’s & 12 App Pools
  • Microsoft Update
    • 250M UScans/Day, 18K ASP.NET Req/Sec, 1.1M ConCurrent
    • 28.2 Billion Downloads for CY 2005
    • Egress – MS, Akamai & Savvis (30-100+ Gbit/Sec)
web site availability

Operations Workbench Plugin

Keynote

Web Site Availability
  • Externally Measured by Keynote Systems, Inc.
  • Benchmark Against Other Large Sites
  • Driving Cross-Team Maturity - Positive Trend in Availability:
    • 2003 – 99.70
    • 2004 – 99.78
    • 2005 – 99.83
    • 2006 – 99.87 YTD
web site availability3
Web Site Availability
  • Total Errors and Daily Availability of www.microsoft.com - ’06 YTD
    • Constantly monitored and analyzed
    • Corrective actions taken as needed
  • Total Errors ’06 YTD grouped per error type
    • Content errors - #1 hit on availability
    • Only 1.3% of the total errors due to server issues (Service unavailable; Server Error; Connection Reset)
resilient against what

Cost

Availability

Performance

Provide Predictable Service

Resilient Against What?

Power / Cooling

Security

ISP /

Telco

Infrastructure

Virus

Data Center

Unauthorized Access

HW / SW

Failure

DDoS Attack

System/Data

Corruption

Application

slide14

Infrastructure Architecture

Technologies

GLBS

DNS

Caching

WALB

DDoS

BGP

Broad Peering

HSRP, OSFP

Spanning

Tree

Clustering

WLB

HSRP, OSFP

Spanning

Tree

Clustering

WLB

slide16

High Availability Architecture- Global Solutions & Networking

  • Global Solutions
    • Content Caching Partners: Akamai & Savvis
    • Global Load Balancing via DNS – Web Cluster Level Mgmt
    • Health Checking and Automatic Fail-over
  • Security Infrastructure
    • Cisco Guards – Anomaly Detection & DOS Filtering
    • Router ACLs Allow HTTP/S Only – Exceptions Require Review
  • Router Architecture – Cookie Cutter
    • Redundant Router and Switch Pairs with VLAN Segregation
    • Simple, Scalable, Manageable, Repeatable
    • Agility – Quickly Repurpose VLANs as Required
slide19

High Availability Architecture - Web & Database Hosting

  • Standard Hosting Models
    • Agility - Quickly Reallocate from System to System
    • Efficiency - Less Staffing & Equipment Required
      • Consistent Configurations
      • Repeatable Infrastructure Architecture
slide20

High Availability Architecture - Web & Database Hosting

  • Server Configurations
    • Standard Server Hardware – Flexibility
    • Identical Baseline O/S, IIS, ASP.NET Configurations
      • Build Scripts for consistent site builds
    • Application Code & Content Unique per Site
    • File, Registry, Service, and Local Security Attributes Collected for Configuration Auditing and Reporting
slide21

High Availability Architecture - Web & Database Hosting

  • Network Load Balancing (NLB) Clusters
    • Main Load Balancing Solution Today
    • Server Cluster Sizes: 3 – 8 Servers/Cluster
    • Positives:
      • Easy Mgmt – Knowledge within Team
      • Free with Windows SKU’s
    • Challenges:
      • Switch Overhead
      • Connection Affinity
      • Application Layer Switching
slide22

High Availability Architecture - Web & Database Hosting

  • Hardware Load Balancing
    • Limited Use for App Layer Load Balancing
    • Future – Greater Adoption for Non-NLB Features
    • Positives:
      • App Layer Load Balancing
      • Connection Affinity
    • Challenges:
      • Added Complexity/Risks
      • Costs – Hardware & People
high availability architecture collecting monitoring reporting
High Availability Architecture - Collecting, Monitoring, & Reporting

SMTP

MOM

Tools Services Layer

IMQ

IIS Log

Monitor

GAL

Cluster

Sentinel

Core

SE

Annotations

Perf

IAdmin

Keynote

AD

Cisco

Guard

high availability architecture remote server management
High Availability Architecture - Remote Server Management
  • Integrated Lights Out (iLO) from HP
    • Cold Reboot
    • Power On/Off
    • Debugging Over iLO – No More Crash Cart
    • Imaging for Dog Food OS Builds
    • RDP Over iLO
  • Movement to “Lights Out” Datacenter
global load balancing caching
Global Load Balancing & Caching
  • Heath Checking and Fail-over
    • Automated pulling of clusters to watermark
    • Removal on demand for maintenance
  • Load Shaping & Distribution
    • Control load percentages to specific clusters
    • Region specific traffic distribution
  • Distributing Patches/Files to 300M+ Clients
    • Partnership with 3 Providers
      • Akamai, Savvis, & MSN
    • Load Distributed via Load Balancing
  • Functions via DNS Resolution and Custom Logic from CDNs
slide26

100%

100%

100%

Global Load Balancing & Caching– Intelligent Load Balancing

x

26

global load balancing caching geo targeting
Global Load Balancing & Caching- Geo Targeting
  • Load Shaping Based on Client Resolver Location
    • Direct Traffic to Particular Clusters or Caching Provider as Appropriate
    • Customer Experience Enhanced due to Improved Local Proximity
  • Load Shaping Based on Client Location
    • CDN Provider Proxies Requests – Responds with File Based on Location of Client
sql server 2005 peer to peer replication
SQL Server 2005Peer-To-Peer Replication
  • Redundancy
    • Each server hosts a copy of the database
  • Availability
    • Individual servers can be patched/upgraded without causing database availability issues
  • Performance
    • Application calls are load balanced between nodes of the cluster for improved scale-out
  • Zero perceived App Downtime
  • Eliminate single point of failure for R/W Databases
  • Considerations:
    • Object names, object schema, and publication names should be identical
    • Publications must allow schema changes to be replicated
    • Updates for a given row should be made only at one database until it has synchronized with its peers
scaling out real world implementation
Scaling Out – Real World Implementation
  • Data Center and Geo redundancy
  • Scalable Units
  • Content Publishing
  • WAN Replication
  • End-to-end monitoring
cpu utilization per platform
Key Take Away's

Huge Gains due to 64-bit H/W & Windows Platforms

Seamless migration provided with WoW64

Enabled www.Microsoft.com to leverage saved infrastructure to enable Data Center Redundancy

App Pool Recycles Eliminated – Enjoying the new 4GB VM address space running under WoW64!!

Enabled more App Pools driving further Isolation of Code & Content in shared hosting models

CPU Utilization Per Platform

Comparative Study: x86 vs. x64

windows 32bit vs 64bit comparison comparative study results windows update download system perf
Windows 32bit vs. 64bit ComparisonComparative Study Results – Windows Update Download System Perf

Scenario

Stress generated by live HTTP traffic from Windows Update Downloads

32bit Application Processes bottlenecked by 2GB Virtual Memory limit vs 4GB capabilities on 64bit operating system enabling Max Mbits/Sec

Improved compute times on 64bit increased Req/Sec while lowering Concurrent Connections (ie. Improved HTTP Request Processing Times)

windows 64bit analysis comparative study results www microsoft com perf
Objective:

Stress a live production server to identify Max ability to serve HTTP traffic from www.Microsoft.com client requests

Windows 64bit Analysis Comparative Study Results: www.Microsoft.com Perf
resources
Resources
  • http://blogs.technet.com/mscom
  • http://blogs.msdn.com/mscomts
r o nlb sql cluster
R/O NLB SQL Cluster
  • Redundancy - Each server hosts a copy of the database
    • SQL1– Read/Write
    • SQL2 & SQL3 – Read/Only
  • Availability
    • Individual servers can be patched/upgraded without causing database availability issues
  • Performance
    • Application calls are load balanced between nodes of the cluster for improved scale-out
r w nlb sql cluster
R/W NLB SQL Cluster

Redundancy - Each server hosts a copy of the database

SQL1-Read/Write - Consolidator

SQL2-Primary Read/Write (active)

SQL3-Logshipping Secondary (stand by)

Availability

Single point of failure

Manual failover – takes minutes to complete

Performance

Application calls to a database are not load balanced between the nodes of the cluster

mirroring sql 2005 sp1
Mirroring (SQL 2005 SP1)

Mirroring

Highest Availability Writes

Log Shipping for DC Redundancy

Reduced failover downtime from 10min avg to <1min (planned)

Considerations:

It works on a per database basis for DBs in full recovery model

Only one database is available for clients at any time

Supports two partners and an optional “witness” server for automated failover

tcp improvements client testing
TCP Improvements – Client Testing

What Exactly Changed?

Compound TCP (CTCP) - controls TCP sending window size; interesting when LH is the server

Receive Window Auto-Tuning – controls TCP receive window size; interesting when Vista is client

Test Scenario

Clients: Dual boot client (XPSP2 & Vista 5308)

Test: Download (EN W2KSP4 ~135MB) from 4 locations (Tukwila, Bay, Florida & Frankfurt)

Results

Corporate network environment - direct Internet connectivity (high speed, low packet loss)

5–7% relative speed gain in low latency scenarios (2-20msec RTT)

>150% relative speed gain in mid to high latency scenarios (80-180msec RTT)

Home network environment (Comcast cable modem)

~40% relative speed gain (16-330msec RTT)

tcp ip throughput improvements
TCP/IP Throughput Improvements

Server to server transfer over 20ms RTT Link

W2K3  W2K3: 10-12 Mbps

Longhorn  Longhorn: > 300Mbps

Vista client Internet download speeds 160ms RTT > 2x