Microsoft in the Enterprise
Download
1 / 38

Windows Scalability: Technology, Challenges and Limitations Andreas Kampert - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

Microsoft in the Enterprise. Windows Scalability: Technology, Challenges and Limitations Andreas Kampert. Agenda. Scale-up and Scale-out Scale-Up CPU, Memory, Disks What does this mean for Windows applications Scale-Out Clones Partitioning Scale-Up and Scale-Out together

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Windows Scalability: Technology, Challenges and Limitations Andreas Kampert' - abiola


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Microsoft in the Enterprise

Windows Scalability: Technology, Challenges and Limitations

Andreas Kampert


Agenda l.jpg
Agenda

  • Scale-up and Scale-out

  • Scale-Up

    • CPU, Memory, Disks

    • What does this mean for Windows applications

  • Scale-Out

    • Clones

    • Partitioning

  • Scale-Up and Scale-Out together

    • Application example Sieble Enterprise Application


Scalable systems l.jpg

Scale UP

Scalable Systems

  • Scale UP: grow by adding components to a single system

  • Scale Out: grow by adding more systems

Scale OUT


Everything starts with understanding your computer l.jpg

CPU 0

CPU 1

CPU 2

CPU 3

Main Memory

Main Memory

Controller

System Bus

PCI Bus 1

PCI Bus

PCI Bus 2

Controller

Controller

Controller

Everything starts with understanding your computer


Agenda5 l.jpg
Agenda

  • Scale-up and Scale-out

  • Scale-Up

    • CPU, Memory, Disks

    • What does this mean for Windows applications

  • Scale-Out

    • Clones

    • Partitioning

  • Scale-Up and Scale-Out together

    • Application example Sieble Enterprise Application


The memory hierarchy l.jpg
The Memory Hierarchy

  • Locality REALLY matters

  • CPU 2 Ghz, RAM at 5 MhzRAM is no longer random access

  • Organizing the code gives 3x (or more)

  • Organizing the data gives 3x (or more)

  • Level latency (clocks)

  • Registers 1

  • L1 2

  • L2 10

  • L3 30

  • Near RAM 100

  • Far RAM 300


32 bit windows virtual address space l.jpg
32-bit Windows Virtual Address Space

00000000

Application Code

Global Variables

.DLL code

Unique per process, accessible in user or kernel mode

3 GB allows

Extension

Requires:

Boot.ini Setting

plus

large_address_aware

7FFFFFFF

80000000

Exec, Kernel, HAL, drivers, per-thread kernel mode stacks, Win32K.Sys

File system cache

Paged pool

System PTEs

Non-paged pool…

Per process, accessible only in kernel mode

C0000000

Process page tables,

hyperspace

System wide,

accessible only in kernel mode

FFFFFFFF


Memory mapping l.jpg
Memory Mapping

Virtual Memory

Physical Memory

Pagefile(s)

Process 1

User Address

Space

System Address

Space

Process 2

User Address

Space

System Address

Space


Physical address extension for ia32 l.jpg
Physical Address Extension for IA32

  • PAE required, if using >4GB physical memory

  • Makes additional memory available to the OS

    • Has no impact to applications

    • Applications require AWE (see later)

  • Enabling PAE

[boot loader]

timeout=30

default=multi(0)disk(0)rdisk(0)partition(1)\WINNT

[operating systems]

multi(0)disk(0)rdisk(0)partition(1)\WINNT= “Windows PAE" /PAE


Address windowing extension api s l.jpg
Address Windowing Extension API’s

  • Allows Applications to bypass the 4 GB limit

  • Advantages of the AWE API’s

  • Small API Set utilizing a windowing technique

    • VirtualAlloc() with the MEM_PHYSICAL FLAG

    • AllocateUserPhysicalPages()

    • MapUserPhysicalPages()

    • FreeUserPhysicalPages()


Awe mechanism l.jpg
AWE Mechanism

Physical

Memory

Application

Virtual Address Space

2 GB (or 3) GB

Application

Memory Space

MapUserPhysicalPages()

AllocateUserPhysicalPages()

AWE Region

Allocated Using

VirtualAlloc()

AllocateUserPhysicalPages()


Hot add memory l.jpg
Hot-Add Memory

  • Requires

    • Hardware and

    • BIOS support

      • SRAT

      • ACPI 2.0

      • Reporting Memory at Post



Thread scheduling l.jpg
Thread Scheduling

31

16

  • Priority driven, preemptive

    • No attempt to share processor's “fairly” among processes, only among threads

    • Event-driven; no guaranteed execution period before preemption

  • Time-sliced, round-robin within a priority level

  • Simultaneous thread execution on MP systems

    • Any processor can interrupt another processor to schedule a thread

    • Tries to keep threads on same CPU (“ideal processor”)

15

1

0

i


Affinity l.jpg
Affinity

  • Threads can run on any CPU, unless affinity specified otherwise

    • Affinity specified by a bit mask

    • Each bit corresponds to a CPU number

  • Thread affinity mask must be subset of process affinity mask, which in turn must be a subset of the active processor mask

  • “Hard Affinity” can lead to threads’ getting less CPU time than they normally would

    • More applicable to large MP systems running dedicated server apps


Disks are becoming tapes l.jpg
Disks Are Becoming Tapes

150 GB

  • Capacity:

    • 150 GB, 300 GB, 2 TB

  • Bandwidth:

    • 40 MBps 150 MBps

  • Read time

    • 2 hours sequential, 2 days random 4 hours sequential, 12 days random

150 IO/s 40 MBps

1 TB

200 IO/s 150 MBps


Amdahl s balanced system laws l.jpg
Amdahl’s Balanced System Laws

  • 1 mips needs 1 MB ram and needs 20 IO/s

  • At 1 billion instructions per secondneed 4 GB/cpuneed 50 disks/cpu!

  • 64 cpus … 3,000 disks

1 bips

cpu

4 GB

RAM

50 disks

10,000 IOps

75 TB


Exchange server memory management l.jpg
Exchange Server Memory Management

  • Exchange Server does not use memory beyond 4GB efficiently

  • Exchange Server 2003 requires /3GB with more than 1GB RAM

  • Exchange Server 2003 has no advantage through the usage of PAE

  • AWE not used by Exchange Server

    MSExchangeIS\VM Largest Block Size

    MSExchangeIS\VM Total 16MB Free Blocks

    MSExchangeIS\VM Total Free Blocks

    MSExchangeIS\VM Total Large Free Block Bytes


Exchange server processors l.jpg
Exchange Server Processors

  • Exchange Server Mailbox Server scales well up to 8 Processors

  • With more than 8 processors mostly hardware partitioning is recommended

  • With more than 8 processors use affinity mask to reduce to 8 processors for Exchange Server 2003

  • Eventually additional processors for Virus Scanner, etc


Sql server memory management l.jpg
SQL Server Memory Management

  • SQL Server 32-bit supports up to 64 GB

  • Usage of more than 4 GB requires fixed memory

    • Dynamic memory management is no longer possible

    • Access time not linear!!!!

  • Use 64-bit SQL Server

  • Same issues with other DBMS

16 GB

64 GB

4GB

PAE N

3GB o

AWE o

PAE Y

3GB o

AWE Y

PAE Y

3GB N

AWE Y


Understand what the cpu does for sql server l.jpg

CPU 0

CPU 1

CPU 2

CPU n

Fibers Write Directly to Clients

Win NT Thread 0

Win NT Thread 1

Win NT Thread 2

Win NT Thread n

Network

Fibers

Fibers

Fibers

Fibers

NT Queues Reads Issued by Fibers to I/O Completion Port

UMS Schedules Fibers

UMS

Work

Queue

UMS

Work

Queue

UMS

Work

Queue

UMS

Work

Queue

Network

NT I/O

Completion

Port

Win Thread

Network Handler

Network Handler Notified When I/O Completes

Understand what the CPU does for SQL Server


Terminal server historic issues with scalability l.jpg
Terminal ServerHistoric Issues with Scalability

  • 32-bit systems

    • Servers often run out of kernel virtual memory rather than CPU

      • All applications must share the same 2 GB kernel address space

      • Adding RAM does not help

    • Most customers run 1Proc and 2Proc servers

      • Administrators must deploy and manage many servers

      • Reduces effectiveness of server consolidation

  • IA64 systems

    • Cannot run 32-bit applications without high overhead of WOW emulation

    • Incremental users/server outweighed by cost


X64 editions l.jpg
x64 Editions

“First mover” Workloads:

Preliminary Testing

  • Key value

    • Core OS functionality & performance benefits (64-bit)

    • Runs most existing 32-bit apps with increased performance

    • Provides evolutionary path to 64-bit applications

  • Single code-base based on WS03 SP1

    • AMD Opteron/Athlon 64 & Intel Xeon EM64T supported with one product

  • Compatibility

    • WS03 SP1 level compatibility

    • Application kernel mode code and drivers must be 64-bit


Agenda24 l.jpg
Agenda

  • Scale-up and Scale-out

  • Scale-Up

    • CPU, Memory, Disks

    • What does this mean for Windows applications

  • Scale-Out

    • Clones

    • Partitioning

  • Scale-Up and Scale-Out together

    • Application example Sieble Enterprise Application


Clones availability scalability l.jpg
Clones: Availability+Scalability

  • Some applications are

    • Read-mostly

    • Low consistency requirements

    • Modest storage requirement (less than 1TB)

  • Examples:

    • HTML web servers

    • LDAP servers

  • Replicate app at all nodes (clones)

  • Load Balance:

    • Spray& Sieve: requests across nodes

    • Route: requests across nodes

  • Grow: adding clones

  • Fault tolerance: stop sending to that clone


Partitions for scalability l.jpg
Partitions For Scalability

  • Clones are not appropriate for some apps.

    • State-full apps do not replicate well

    • high update rates do not replicate well

  • Examples

    • Email

    • Databases

    • Read/write file server…

    • Cache managers

    • chat

  • Partition state among servers

  • Partitioning:

    • must be transparent to client.

    • split & merge partitions online


Agenda27 l.jpg
Agenda

  • Scale-up and Scale-out

  • Scale-Up

    • CPU, Memory, Disks

    • What does this mean for Windows applications

  • Scale-Out

    • Clones

    • Partitioning

  • Scale-Up and Scale-Out together

    • Application example Sieble Enterprise Application


Siebel 7 environment l.jpg
Siebel 7 Environment

Server

Manager

GUI

Web

Client

Wireless

Client

Mobile

Web

Client

Handheld

Client

Dedicated

Web

Client

Wireless

Gateway

Server

Mobile

DB

SQL

CE

Web Server

Siebel Web Server

Extension

Siebel Enterprise Server

Siebel Gateway Server

Connection Broker Name Server

Server

Manager

Cmd Line

Interface

Siebel

Server

Siebel

Server

Siebel

Server

EAI

&

Data Loading

Siebel

Database

Siebel File

System



Slide30 l.jpg

© 2004 Microsoft Corporation. All rights reserved.

This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.


Memory latency and cpu caches l.jpg
Memory Latency And CPU Caches

  • CPUs are much faster than memory,gap continues to grow(100Mhz -> 2+Ghz vs. 80ns -> 50ns)

  • Caches needed to hide memory latency

  • Cache effectiveness depends onlocality of memory references(e.g. cached data & code must be reused >9x before being pushed out)

  • “cacheline” = 32, 64, ... bytes(unit of replacement & collision)


Effect of cache hit ratio on performance l.jpg
Effect Of Cache Hit RatioOn Performance

1 / ( (FastTime * HitRatio) + (SlowTime * (1-HitRatio) ) )

Fast: 7 cycles for L2 hit

Slow: 150 cycles for RAM access

Actual effect depends on memory accesses per instruction


Disks are becoming tapes consequences l.jpg
Disks Are Becoming TapesConsequences

  • Use most disk capacity for archivingCopy on Write (COW) file system in Windows Server 2003

  • RAID10 saves arms, costs space (OK!).

  • Backup to diskPretend it is a 100GB disk + 1 TB disk

    • Keep hot 10% of data on fastest part of disk

    • Keep cold 90% on colder part of disk

  • Organize computations to read/write disks sequentially in large blocks


12 000 user benchmark on hp windows sql64 l.jpg
12,000 User Benchmark on HP/Windows/SQL64

  • Concurrent Users

  • Server Component Throughput

SQL64 on a 4x 1.5 GHz Itanium2 HP Integrity used 47% CPU and 13.3 GB memory proving unprecedented price/performance for Siebel



Siebel scalability on available platforms l.jpg
Siebel Scalability On Available Platforms utilization

Note: 30,000 user tests are based on Siebel 7.0.3 and 32,000 test is based on 7.5.2; transaction mix is different between Siebel 7,0.3 and 7.5.2 test suites.



X64 performance and benefits l.jpg

Terminal Server Performance Test

Windows Server 2003 x64

Windows Server 2003 (32-bit)

50%

600

Windows 2000

80%

400

200

0

Knowledge Worker

(Hardware: 4P AMD 64 – HP DL 585)

X64 Performance and Benefits

  • Lab testing indicates increased performance

    • Up to 50% improvement in users/server on comparable hardware

    • Knowledge worker simulation

  • Largest benefit will be with 4P servers in limited virtual kernel memory scenarios

    • Opportunity for server consolidation

  • Registry Setting to Reduce Microsoft® Outlook® 2003 Periodic Polling

    • HKEY_CURRENT_USER\Software\Microsoft\Office\11.0\Outlook\RPC

      ConnManagerPoll [dword] 0x600