A Measurement Based Memory Performance Evaluation of High Throughput Servers

A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum & Minerals Dhahran Saudi Arabia April 14, 2003

Motivation • CPU – Memory speed gap • CPU speed doubles in about 18 months (Moore’s Law) • Memory access time improves by only one-third in 10 years • Network bandwidth has improved significantly • Gigabit per second already deployed on LAN • NIC operates up to 10 Gbps • Ethernet switches also available in that range • Hierarchical memory architecture introduced to alleviate • CPU–memory speed gap • It works on locality of reference of data • temporal locality • spatial locality

Motivation • Is it all applications that benefit from memory hierarchy? • some data have poor temporal locality (continuous data) • working set might be too large to fit into cache even if the data has good spatial locality • some data are never reused • For applications utilizing these type of data, hierarchical • memory architecture becomes ineffective SO WHERE IS EXACTLY THE BOTTLENECK?

RTSP client server RTP server client RTCP server client Memory Access Streaming Media Servers • Streaming media content is a continuous data • working set is normally large, cannot fit into cache • it has very poor temporal locality (data reuse is poor) • A typical scenario of streaming media transaction (RTSP)

Typical data flow in streaming using RTP • The transaction has: • stringent timing requirement • high bandwidth requirement • CPU intensive • high memory requirement

HTTP client server Memory Access Memory Access Web Servers • Web content is normally a set of small files that make • a web document • working set is normally composed of small files (average aggregate size is 10k) • poor temporal locality • little or no data reuse • Web transaction (HTTP)

Typical data flow in HTTP transaction • The transaction has: • relaxed timing requirement • but also high bandwidth requirement • high connection rate (as connections are established and torn within a short time – HTTP/1.0)

Memory Access IP Forwarding • IP packets are generally small (maximum is 65536 bytes). • Due to datagram fragmentation by routers, the packets are • typically less than 15 KB (MTU issue). • packets are just forwarded, no data associated with any packet is reused. • apart from the need for high speed, no strict timing needs to be maintained • At high throughput, a lot of memory copying is involved: moving a lot of data (IP headers) into cache for processing.

Typical data flow in IP forwarding

Server Platform • Pentium 4 processor (2.0 GHz): • L1 cache 8 KB • L2 cache 512 KB • Peripherals: • 1 Gbps NIC • 40 GB EIDE hard drive (Western Digital WD400) • Main memory: 256 MB • Operating systems: • Linux Red Hat 7.2 (kernel 2.4.7-10) • Windows 2000 server • Network (LAN): • 1 Gbps layer II switch

Memory Transfer Test • ECT (extended copy transfer) Characterizing the memory performance to observe what might be the impact of OS on memory performance • Locality of reference: • temporal locality – varying working set size (block size) • spatial locality – varying access pattern (strides)

Performance of streaming media servers • L1 Cache Performance L1 cache misses (56kpbs) L1 cache misses (300kbps) • L1 cache misses are mostly influenced by number of streams • Worst-case performance when the number of streams is high: • 300kbps encoding rate and multiple media contents • are requested by clients

Performance of streaming media servers • Memory Performance and throughput Page fault rate (300kbps) Throughput (300kbps) • Requests for unique media object does not incur much page • faults since object can easily be served from memory • Requests for multiple objects leads to high page fault rate • since a lot of data blocks will have to be fetched from the disk • High page fault rate leads to client’s timeout due to long delay

Performance of Web servers • Transactions and Throughput • Smaller files are transferred • within short time, hence more • connections are established • and released at a high rate. Number of transactions per second • For larger files, throughput is high • even though, the transactions/sec • is low (less connection made) Throughput in Mbytes/sec

Performance of Web servers • Cache Performance • L1 and L2 cache performance is • poor when the document size is • small. WHY?  L1 cache misses L2 cache misses

Performance of Web servers • Page Fault and Latency • Unlike a small file, a large file • will have to be continuously • fetched from disk, leading to • more page faults Page fault rate • Large files significantly increase • the average latency of the server. • As clients wait for too long, they • may time out Latency

Performance of IP forwarding Experimental setup • Routing (creating and updating of routing table) • is done by ‘routed’ • IP forwarding – Linux kernel space

Performance of IP forwarding Routing configuration • 1 and 2 • 3 and 4 1-1 communication (simplex and duplex) Double 1-1 communication (simplex and duplex) • 7 and 8 • 5 and 6 Ring communication (simplex and duplex) 1-4 communication (simplex and duplex)

Performance of IP forwarding • Bandwidth

Performance of IP forwarding • Maximum bandwidth: 449 Mbps • at configuration 2 – only two NICs involved in router • CPU utilization (system) – mere 19.04% • context switching – 1312 (only two NICs switched) • Active page – 1006.48 (highest observed) • Very small packet size (64 bytes) degrades performance. • Accounts for highest context switching • Fairly uniformly distributed active page figures indicates • that memory activity is not very intensive.

Performance of IP forwarding • Other metrics

Conclusion Streaming servers: • Performance highly degraded due to cache misses and • page faults. • Uses continuous data with large working set and poor • temporal locality (no data reuse) Web servers: • Small working set does not help much as frequent connection • setup and tear down degrades performance significantly • When the document is large in size, the server delay becomes • unacceptably high, leading to client timeout • Large document size also leads to high page fault rate

Conclusion IP forwarding: • Memory performance is not the main factor in the overall • performance of IP forwarding in Linux kernel • Context switching overhead is highly significant, and a key • factor in performance degradation. The more interface • involved in the forwarding of packets, the more the • contention for resources (bus contention). • All CPU activity (kernel space only) is below 100 %. If we • resolve bus contention, we can obtain more throughput

Thank you

A Measurement Based Memory Performance Evaluation of High Throughput Servers

A Measurement Based Memory Performance Evaluation of High Throughput Servers

Presentation Transcript

Performance Measurement (Evaluation Systems)

IEPM-BW a new network/application throughput performance measurement infrastructure

Measurement and Evaluation of ENUM Server Performance

Performance of a high throughput multichannel detector for life science applications

Teacher Evaluation and Performance Measurement

A Pattern-Matching Scheme With High Throughput Performance and Low Memory Requirement

Measurement and Evaluation of Human Performance

Rapid Development of High Performance Servers

Performance Measurement of a Prosspero Based Trouble Ticket System

Evaluation of oligonucleotide rice microarrays as a high-throughput, low-cost

Performance Based Teacher Evaluation

High performance Throughput

Performance Evaluation of TCP Throughput on Wireless Cellular Networks

High-throughput cell-based assays

MEMORY PERFORMANCE EVALUATION OF HIGH THOUGHPUT SERVERS

Building High Throughput, Multi-threaded Servers in C#/.NET

Performance Measurement and Evaluation

High-Performance Throughput Tuning/Measurements

High-throughput cell-based assays