1 / 7

Data Latency

Data Latency. Rich Altmaier Software and Services Group. CPU Architecture Contribution. Data intensive == memory latency bound Minimal cache line use and reuse Often pointer chasing – hard to prefetch. CPU Architecture Contribution. Large Instruction cache

kirkan
Download Presentation

Data Latency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Latency Rich Altmaier Software and Services Group

  2. CPU Architecture Contribution • Data intensive == memory latency bound • Minimal cache line use and reuse • Often pointer chasing – hard to prefetch

  3. CPU Architecture Contribution • Large Instruction cache • Capture a sophisticated code loop, esp database • Share last level cache across cores • Nehalem added this for I & D • When lacking, a copy per core of I, and data lock lines have to move between caches • Integrated Memory Controller • Big win for latency in Nehalem • QPI for socket to socket cache line movement • Introduced in Nehalem, faster than FSB

  4. CPU Architecture Contribution • Improvements in branch prediction • Successful prediction of more complex branching structures • Total number of outstanding cache line reads per socket • Improved in Nehalem • Exploited by Out of Order execution • Exploited by Hyper Threading (database benchmarks usually enable and win) • Opportunity to tune data structures for parallel reading

  5. System Architecture Contribution • Larger physical memory • Faster memory (lower latency) • Faster I/O, and more ports, for data movement • SSDs – big boost to IOPS (I/Os per second) • Filesystem read/write is usually small and scattered • No big sequential ops • Faster networking

  6. Summary • Large & shared cache • Latency reduction with Integrated Memory Controller, and QPI socket to socket • Total number of outstanding reads • Branch prediction • Storage configured for IOPS

More Related