slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Mysteries of Windows Memory Management Revealed PowerPoint Presentation
Download Presentation
Mysteries of Windows Memory Management Revealed

Loading in 2 Seconds...

play fullscreen
1 / 66

Mysteries of Windows Memory Management Revealed - PowerPoint PPT Presentation


  • 1684 Views
  • Uploaded on

Mysteries of Windows Memory Management Revealed. Mark Russinovich Technical Fellow Windows Azure (created jointly with Dave Solomon). About Me. Technical Fellow, Windows Azure, Microsoft Cofounder and chief software architect of Winternals Software

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Mysteries of Windows Memory Management Revealed' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mysteries of windows memory management revealed

Mysteries of Windows Memory Management Revealed

Mark Russinovich

Technical Fellow

Windows Azure

(created jointly with Dave Solomon)

about me
About Me
  • Technical Fellow, Windows Azure, Microsoft
  • Cofounder and chief software architect of Winternals Software
  • Coauthor of Windows Internals book series
    • With Dave Solomon
  • Coauthor of Sysinternals Administrator’s Reference (Q1 2011?)
    • With Aaron Margosis
  • Author of Zero Day, A Novel (March 2011)
  • Author of Windows Sysinternals tools
    • Home of blog and forums
goals
Goals
  • Deep dive on:
    • Process virtual and physical memory usage
    • Operating system virtual and physical memory usage
  • Crisply define memory-related terminology
  • Highlight tools that reveal memory usage
  • Describe ‘dark spots’ in memory analysis counters and tools
agenda
Agenda
  • Virtual Memory
    • Address Space Usage
    • Process Commit
    • System Commit
  • Physical Memory
    • Working Sets
    • Paging Lists
  • Hard to Track Memory
tools we ll use
Tools We’ll Use
  • Task Manager
  • Sysinternals Process Explorer
  • Sysinternals Vmmap
    • Process virtual and physical memory usage
  • Sysinternals Rammap
    • System physical memory usage
  • Sysinternals Testlimit
    • Test program to leak different kinds of memory

Sysinternals tools are free at www.sysinternals.com

memory management fundamentals
Memory Management Fundamentals
  • Windows has demand-paged memory management
    • Processes “demand” memory as needed
    • There is no swapping
  • A page is 4 KB (8 KB on Itanium)
  • Large pages are available for improved TLB usage
    • x86: 4 MB
    • X64 and x86 PAE: 2 MB
    • Itanium: 16 MB
  • There is NO (will, almost no) connection between virtual memory and physical memory
32 bit x86 address space
32-bit x86 Address Space
  • 32-bits = 2^32 = 4 GB
      • /3GB and /USERVA can extend process address up to 3 GB
      • Process must be marked “large address space aware” to use memory above 2 GB

Default

3 GB user space

3 GB

Per-Process

Space

2 GB

Per-Process

Space

2 GB

System

Space

1 GB

System

Space

64 bit address spaces
64-bit Address Spaces
  • 64-bits = 2^64 = 17,179,869,184 GB
    • x64 today supports 48 bits virtual = 262,144 GB = 256 TB
    • IA-64 today support 50 bits virtual = 1,048,576 GB = 1024 TB
    • 64-bit Windows supports 44 bits = 16,384 GB = 16 TB

x64

(AMD64& Intel 64)

IA-64

8 TB

Per-Process

Space

7 TB

Per-Process

Space

8 TB

System

Space

7 TB

System

Space

virtual address space components
Virtual Address Space Components
  • Address space breakdown
    • Private (e.g. process heap)
      • Reserved or committed
    • Shareable (e.g. EXE, DLL, shared memory, other memory mapped files)
      • Reserved or committed
    • Free (not yet defined)
  • Performance counters available:
    • Private Bytes – committed private memory
    • Virtual Bytes – total of shareable+private (including reserved)
    • No separate counters for Shareable or Reserved or Free
why reserve memory
Why Reserve Memory?
  • Reserved memory lets an application lazily commit contiguous memory
  • Used for stack and heap expansion

Stack

Grows

Down

Committed

Committed

Thread

Stack

Guard

Reserved

Guard

Reserved

Before Expansion

After Expansion

viewing address space breakdown
Viewing Address Space Breakdown
  • Task Manager only lets you see private bytes
    • Before Vista: column called “VM Size”
    • Vista and later: column called “Commit Size”
  • Process Explorer shows both virtual size and private bytes
    • Add 2 columns to process list
      • Virtual Size
      • Private Bytes
    • Run Testlimit twice
      • Testlimit -r
      • Testlimit -m
    • Note: if on 64-bit Windows, 32-bit Testlimit can grow to 4GB
understanding process address space usage
Understanding Process Address Space Usage
  • Most virtual memory problems are due to a process leaking private committed memory
    • Heap, GC heap, language heaps (CRT)
  • Private Bytes only tells part of the story
    • Doesn’t account for shareable memory that’s not shared (e.g. DLLs loaded only by this process)
    • Fragmentation can be an issue
      • Address space can effectively be exhausted prematurely
  • Basic performance counters don’t provide enough information to troubleshoot

Fragmented

Address

Space

viewing processes with vmmap
Viewing Processes with VMMap
  • VMMap shows detailed breakdown of process address space:
    • Private process memory
      • Copy-on-write
      • Private (VirtualAlloc)
      • Heap and GC Heap
      • Stack
    • Shareable process memory
      • Shareable – shareable memory
      • Mapped File – memory mapped files
    • Page table – page table pages
  • Note that “shareable” types can have private commitment
    • Read/write pages in shared memory
    • Copy-on-write pages
viewing fragmentation
Viewing Fragmentation
  • Fragmentation is visible by selecting Options->Show Free Regions, selecting the Free type, and sorting by size
    • Largest free block is largest allocation possible
  • Clickable fragmentation map in View->Fragmentation View
  • Run testlimit -t on 64-bit Windows
    • Threads need 256 KB 64-bit stack and 1 MB 32-bit stack
file mappings
File Mappings
  • File mapping enables an application to read and write file data through memory operations
  • File mappings are used for
    • Image (.EXE and .DLL) loading: “Image” in VMMap
    • Data files access (e.g. NLS files): “Mapped File” in VMMap
    • “Pagefile-backed” shared memory: “Shareable” in VMMap
  • Entire file doesn’t have to be mapped
    • Allows for “windows” into the file

Database.db

Address

Space

tracing file mapping with process monitor
Tracing File Mapping with Process Monitor
  • Procmon can trace image loader activity
vmmap differencing
VMMap Differencing
  • Press F5 to refresh the view
  • VMMap keeps all snapshots
    • Use the timeline to select snapshots to compare
tracing with vmmap
Tracing with VMMap
  • You can launch a process with profiling
    • Detours tracks virtual and heap activity
the system commit limit
The System Commit Limit
  • System committed virtual memory must be backed either by physical memory or stored in the paging file
    • Sum of (most of) physical memory and current paging files
  • Allocations charged against the system commit limit:
    • Process private bytes
    • Pagefile-backed shared memory
    • Copy-on-write pages
    • Read/write file pages
    • System paged and nonpaged code and data
  • When limit is reached, virtual memory allocations fail
    • Processes may crash (or corrupt data)
changing the system commit limit
Changing the System Commit Limit
  • You can increase the system commit limit by adding RAM or increasing the pagefile size
  • The system commit limit can grow if paging files configured to expand
    • So the system commit limit might be the current limit, not the maximum
    • Default configuration (“System Managed”):
      • Minimum: 1.5x RAM if RAM < 1 GB; RAM otherwise
      • Maximum: 3x RAM or 4 GB, whichever is larger
  • Maximum system commit limit should be based on system commit peak for extreme workload
viewing system commit usage
Viewing System Commit Usage
  • Performance Counters:
    • Committed Bytes
    • Commit Limit
  • Task Manager
    • XP: commit charge labeled “PF Usage”
    • Vista: commit charge labeled “Page File”
    • Win7: commit charge labeled “Commit”
    • Vista and Win7 show commit limit after slash
viewing the system commit limit
Viewing the System Commit Limit
  • Process Explorer shows commit charge (with history), commit limit, and commit peak
    • No built-in tool shows peak any more
exhausting the system commit limit
Exhausting the System Commit Limit
  • On 32-bit system, run “Testlimit –m” multiple times until system commit limit exhausted
  • On 64-bits, “Testlimit64 –m” will exhaust the system commit limit before its address space:
sizing the paging file
Sizing the Paging File
  • If you enough RAM to support your commit needs, why even have one?
    • System can page out unused, modified private pages vs keeping them in RAM
    • More RAM available for useful stuff
  • Many recommendations use a formula based on RAM (1.5x, 2x, etc.)
    • Actually, the more RAM, the smaller the paging file needed
    • Should be based on workload usage of committed virtual memory
  • Look at commit peak after workload has run
    • Pre-Vista: Task Manager
    • Vista+: Process Explorer
    • Apply a formula to that to give buffer (1.5x or 2x)
    • Make sure it’s big enough to hold a kernel crash dump
working set list
Working Set List
  • All the physical pages “owned” by a process
    • E.g. the pages the process can reference without incurring a page fault
  • A process always starts with an empty working set
    • It then incurs page faults when referencing a page that isn’t in its working set
    • Many page faults may be resolved from memory

newer pages

older pages

Working Set

working set
Working Set
  • Each process has a default working set minimum and maximum
    • Can change with SetProcessWorkingSet
    • Working set minimum controls maximum number of locked pages (VirtualLock)
    • Minimum is also reserved from RAM as a guarantee to the process
    • Working set maximum is ignored
  • If there’s ample memory, process working set represents all the memory it has referenced (but not freed)
    • If memory is tight, working sets get trimmed
working set replacement
Working Set Replacement

To standby

or modified

page list

  • When memory manager decides the process is large enough, it give up pages to make room for new pages
  • Local page replacement policy
    • Means that a single process cannot take over all of physical memory unless other processes aren’t using it
    • Page replacement algorithm is least recently accessed (pages are aged when available memory is low)

Working Set

working set breakdown
Working Set Breakdown
  • Consists of 2 types of pages:
    • Shareable (of which some may be shared)
    • Private
  • Four performance counters available:
    • Working Set Shareable
      • Working Set Shared (subset of shareable that are currently shared)
    • Working Set Private
    • Working Set Size (total of WS Shareable+Private)
      • Note: adding this up for each process overcounts shared pages
  • Caveats:
    • Working set does not include trimmed memory that is still cached
    • Shareable working set should be viewed as “private” if it’s not shared
viewing working set with task manager
Viewing Working Set with Task Manager
  • Displays private working set size
    • Calls it “Memory (Private Working Set)”
viewing working set with process explorer
Viewing Working Set with Process Explorer
  • Process Explorer shows all the performance counters
    • Virtual Bytes
    • Private Bytes
    • WS Shareable Bytes
    • WS Shared Bytes
    • WS Private Bytes
  • Run Testlimit three times:
    • Testlimit -r 1024 -c 1
    • Testlimit -m 1024 -c 1
    • Testlimit -d 1024 -c 1
  • Note how working set numbers don’t at all represent the process virtual memory usage
viewing the working set with vmmap
Viewing the Working Set with VMMap
  • Vmmap shows working set size of each component of address space
  • Also shows locked pages
  • Copy-on-write pages will show up as Private WS in shareable regions
how copy on write works before
How Copy-On-Write Works:Before

Process

Address

Space

Process

Address

Space

Physical

memory

Orig. Data

Page 1

Orig. Data

Page 2

Page 3

how copy on write works after
How Copy-On-Write Works:After

Process

Address

Space

Process

Address

Space

Physical

memory

Orig. Data

Page 1

Mod’d. Data

Page 2

Page 3

Copy of page 2

managing physical memory
Managing Physical Memory
  • System keeps unassigned physical pages on one of several lists
    • Free page list
    • Modified page list
    • Standby page lists (8 as of Vista & later)
    • Zero page list
    • ROM page list
    • Bad page list - pages that failed memory test at system startup
  • Lists are implemented by entries in the “PFN database”
    • Maintained as FIFO lists or queues
paging dynamics
Paging Dynamics
  • New pages are allocated to working sets from the top of the free or zero page list
  • Pages released from the working set due to working set replacement go to the bottom of:
    • The modified page list (if they were modified while in the working set)
    • The standby page list (if not modified)
      • Decision made based on “D” (dirty = modified) bit in page table entry
    • Association between the process and the physical page is still maintained while the page is on either of these lists
standby and modified page lists
Standby and Modified Page Lists
  • Modified pages go to modified (dirty) list
    • Avoids writing pages back to disk too soon
  • Unmodified pages go to standby (clean) lists
  • They form a system-wide cache of “pages likely to be needed again”
    • Pages can be faulted back into a process from the standby and modified page list
    • These are counted as page faults, but not page reads
modified page writer
Modified Page Writer
  • When modified list reaches certain size, modified page writer system thread is awoken to write pages out
    • Also triggered when memory is overcommitted (too few free pages)
    • Does not flush entire modified page list
  • Two system threads
    • One for mapped files, one for the paging file
  • Pages move from the modified list to the standby list
    • E.g. can still be soft faulted into a working set
free and zero page lists
Free and Zero Page Lists
  • Free Page List
    • Used for page reads
    • Private modified pages go here on process exit
    • Pages contain junk in them (e.g. not zeroed)
    • On most busy systems, this is empty
  • Zero Page List
    • Used to satisfy demand zero page faults
      • References to private pages that have not been created yet
    • When free page list has 8 or more pages, a priority zero thread is awoken to zero them
    • On most busy systems, this is empty too
paging dynamics42

page read from disk or kernel allocations

demand zero page faults

modified

page

writer

“global valid” faults

working set replacement

Private pages at process exit

Paging Dynamics

Standby

PageLists

Free

PageList

Zero

Page

List

Bad

Page

List

Working

Sets

zero

page

thread

“soft”

page

faults

Modified

PageList

viewing the paging lists with task manager
Viewing the Paging Lists with Task Manager
  • XP/2003:
    • Available = Standby + Zero + Free
    • System Cache = Standby + Modified + System Working Set
  • Vista/Server 2008:
    • Replaced Available with Free
      • Free + Zero list
    • System Cache relabeled Cached
  • Windows 7/Server 2008 R2
    • Available put back
viewing the paging lists with process explorer
Viewing the Paging Lists with Process Explorer
  • Process Explorer shows each paging list
    • Click View->System Information
total process private memory usage
Total Process Private Memory Usage
  • Working Set size does not include:
    • Private memory on standby or modified lists
    • Page tables
  • Rammap shows this on Processes tab
viewing memory usage with rammap
Viewing Memory Usage with Rammap
  • In addition to showing size of paging lists, shows usage breakdown:
    • Process private
    • Mapped file
    • Shared memory
    • Page tables
    • Paged pool
    • Nonpaged pool
    • System PTE
    • Session private
    • Metafile
    • AWE
    • Driver locked
    • Kernel stack
prioritized standby lists
Prioritized Standby Lists

Pages removed

Prioritized Standby Lists

  • In Vista & later, there are 8 prioritized standby lists
  • Pages are removed from lowest priority list first
    • Low memory priority process will keep re-using low priority pages
    • Higher priority information remains cached

Pages added

superfetch
SuperFetch™
  • Superfetch proactively repopulates RAM with the most useful data
    • Sets priority of pages to optimal value, based the page history and other analysis that it performs
    • Takes into account frequency of page usage, usage of page in context of other pages in memory
    • Adapts application launch patterns, in chunks of 8 hours (times a day) and weekend vs weekday
  • Scenarios SuperFetch improves include
    • Resume from hibernate and suspend
    • Fast user switching
    • Performance after infrequent or low priority tasks execute
    • Application launch
  • Windows 7: Disabled if the OS is booted of an SSD
memory priority
Memory Priority
  • Each thread has its own memory priority
    • 5: normal
    • 1: low
  • This determines which standby list is used for the page (when/if it arrives on the standby list)
  • Thread priority comes from process memory priority
    • Can be changed for process or individual thread
      • SetPriorityClass or SetThreadPriority “background mode”
standby list population
Standby List Population
  • Priority 7 come from a static set (pre-trained at Microsoft)
    • Pre-populated at each boot
    • Includes pages related to user input that requires fast responsiveness (right-click, desktop properties, control panel, start menu, etc.)
  • Priority 6 are pages that SuperFetch considers important, or useful (will rarely get repurposed)
  • Priority 5 are standard user pages (memory priority 5)
  • Priority 1 are low priority user pages (memory priority 1)
  • Priority 0-4 may be Superfetch decayed, cache manager read-ahead and pagefault clustering
how much of the standby list has been consumed
How Much of the Standby List has Been Consumed?
  • RAMMap shows the amount of memory repurposed off each standby list since boot:
what file data is in the standby lists
What File Data is In the Standby Lists?
  • Viewing Cached Files with Rammap
do you have enough memory
Do You Have Enough Memory?
  • There’s no sure-fire rule or counter to tell if you if you have enough memory
  • The general rule: available memory remains generally low
    • Use Perfmon to monitor available memory over time
    • Use Process Explorer, or on Vista and later, Task Manager, to monitor physical memory usage
    • Use Process Explorer, or Task Manager to see instantaneous value of available memory
    • Watch in Process Monitor for excessive reads from paging file
tracing paging with procmon
Tracing Paging with Procmon
  • Procmon distinguishes paging I/Os in the details column
    • Can set filter for “detail contains paging”
  • I/O to Pagefile.sys excluded by default
    • Enable advanced output or remove exclude filter
    • Excessive reads from paging file indicates need more RAM
hidden cost of reserved memory
Hidden Cost of Reserved Memory
  • Memory Manager charges for page tables for reserved address space not yet committed
    • Charged against process private bytes (and therefore system commit limit)
    • Cannot track this down
  • Experiment:
    • Testlimit64 –r 100000 –c 10
    • Testlimit64 –r
      • Reserves ~8192GB
      • Private bytes grows to >16GB!
cost of reserved memory
Cost of Reserved Memory
  • Virtual Address Space (VAD) descriptors come from nonpaged pool
  • Example:
    • Testlimit64 –r results in 640mb of nonpaged pool usage for VADs
    • Poolmon shows this:
shared memory
Shared Memory
  • Shared memory is backed by virtual memory
    • Either paging file (if there is one), else physical memory
  • However, amount created not charged to process commit limit
    • Therefore, a shared memory VM leak is hard to track down
demo shared memory leak
Demo: Shared Memory Leak
  • Testlimit –s 1000 –c 3
    • This creates a 3 GB shared memory section
  • Note that process virtual and private bytes do not include this value
    • And only virtual bytes rise when process maps the section into its address space
virtuallock locked pages
VirtualLock Locked Pages
  • No special privilege is required to VirtualLock pages (as of Vista)
    • Allows process to allocate non-paged memory
  • Locked memory has a major impact on the system:
    • Overrides memory management policies
    • Prevents contiguous physical memory allocation
    • Can prevent hibernation
  • Testlimit -l 1024 –c 1
  • DWM locks memory:
driver locked pages and awe pages
Driver Locked Pages and AWE pages
  • VPC and Hyper-V use driver locked pages
    • Counts against system commit, but not otherwise detectable
    • No way to track back to owner
awe memory
AWE Memory
  • Address Windowing Extension (AWE) allows processes to directly control physical memory
    • Way to override the memory manager’s caching algorithms
    • Can be more than can be contained in the address space
  • AWE memory is not accounted to the owning process
    • Thus, available physical memory drops, but no process accounts for usage
demo awe memory leak
Demo: AWE Memory Leak
  • Testlimit -a 1000 -c 1
    • Allocates 1000 MB AWE memory
more information
More Information
  • More information in Windows Internals, 5th Edition
    • Memory Management chapter
  • MSDN memory management API documentation
  • The best way to gain an understanding of memory manager behavior is to experiment and observe
  • Come to my other sessions:
    • Inside Windows Azure: 10:30
    • Case of the Unexplained: 12:30
  • Sysinternals Primer: 9:30
    • Aaron Margosis and Tim Reckmeyer
session evaluations

Session Evaluations

Tell us what you think, and you could win!

All evaluations submitted are automatically entered into a daily prize draw* 

Sign-in to the Schedule Builder at http://europe.msteched.com/topic/list/

* Details of prize draw rules can be obtained from the Information Desk.