70 likes | 226 Views
This report discusses the key contributions to CPU architecture and system design that reduce data latency, focusing on memory-intensive applications. Innovations such as large shared last-level cache, integrated memory controllers, and QuickPath Interconnect (QPI) enhance data movement and improve memory access times. Notably, the Nehalem architecture's advancements in branch prediction and cache line handling significantly enhance performance, particularly in database benchmarks. Moreover, with increased physical memory and faster I/O systems, overall system efficiency and IOPS are considerably improved.
E N D
Data Latency Rich Altmaier Software and Services Group
CPU Architecture Contribution • Data intensive == memory latency bound • Minimal cache line use and reuse • Often pointer chasing – hard to prefetch
CPU Architecture Contribution • Large Instruction cache • Capture a sophisticated code loop, esp database • Share last level cache across cores • Nehalem added this for I & D • When lacking, a copy per core of I, and data lock lines have to move between caches • Integrated Memory Controller • Big win for latency in Nehalem • QPI for socket to socket cache line movement • Introduced in Nehalem, faster than FSB
CPU Architecture Contribution • Improvements in branch prediction • Successful prediction of more complex branching structures • Total number of outstanding cache line reads per socket • Improved in Nehalem • Exploited by Out of Order execution • Exploited by Hyper Threading (database benchmarks usually enable and win) • Opportunity to tune data structures for parallel reading
System Architecture Contribution • Larger physical memory • Faster memory (lower latency) • Faster I/O, and more ports, for data movement • SSDs – big boost to IOPS (I/Os per second) • Filesystem read/write is usually small and scattered • No big sequential ops • Faster networking
Summary • Large & shared cache • Latency reduction with Integrated Memory Controller, and QPI socket to socket • Total number of outstanding reads • Branch prediction • Storage configured for IOPS