DMA Cache  Architecturally Separate I

DMA Cache Architecturally Separate I PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Download Presentation

DMA Cache Architecturally Separate I

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1. DMA Cache Architecturally Separate I/O Data from CPU Data for Improving I/O Performance Dang Tang, Yungang Bao, Weiwu Hu, Mingyu Chen 2010.1

2. The role of I/O I/O is ubiquitous Load binary files:Disk? Memory Brower web, media stream:Network?Memory?… I/O is significant Many commercial applications are I/O intensive: Database etc.

3. State-of-the-Art I/O Technologies I/O Bus: 20GB/s PCI-Express 2.0 HyperTransport 3.0 QuickPath Interconnect I/O Devices SSD RAID: 1.2GB/s 10GE: 1.25GB/s Fusion-io: 8GB/s, 1M IOPS (2KB random 70/30 read/write mix)

4. Direct Memory Access (DMA) DMA is used for I/O operations in all modern computers DMA allows I/O subsystems to access system memory independently of CPU.  Many I/O devices have DMA engines Including disk drive controllers, graphics cards, network cards, sound cards and GPUs

5. Outline Revisiting I/O DMA Cache Design Evaluations Conclusions

6.

8. Problems of Shared-Cache Scheme Cache Pollution Cache Thrashing Not suitable for other I/O ?Degrade performance when DMA requests are large (>100KB) for “Oracle + TPC-H” application Processor’s LLC treats all data equally, which may cause cache pollution and thrashing problem, especially when I/O data is large (over 100KB). Our experiments show that shared-cache scheme can degrade performance when DMA requests are large for “Oracle + TPC-H” applications.Processor’s LLC treats all data equally, which may cause cache pollution and thrashing problem, especially when I/O data is large (over 100KB). Our experiments show that shared-cache scheme can degrade performance when DMA requests are large for “Oracle + TPC-H” applications.

9. I/O Data V.S. CPU Data

10. A short AD of HMTT [Bao-Sigmetrics08] A Hardware/Software Hybrid Memory Trace Tool Can support DDR2 DIMM interface on multiple platforms Can collect full system off-chip memory traces Can provide trace with semantic information, e.g., virtual address Process id I/O operation Can collect the trace of commercial applications, e.g., Oracle Web server

11. Characteristics of I/O Data(1) % of Memory References to I/O data % of References of various I/O types

12. Characteristics of I/O Data(2) I/O request size distribution?

13. Characteristics of I/O Data(3) Sequential access in I/O data Compared with CPU data, I/O data is very regular

14. Characteristics of I/O Data(4) Reuse Distance (RD) LRU Stack Distance

15. Characteristics of I/O Data(5)

16. Rethink I/O & DMA Operation 20~40% of memory references are for I/O data in I/O-intensive applications. Characteristics of I/O data are different from CPU data An explicit produce-consume relationship for I/O data Reuse distance of I/O data is smaller than CPU data References to I/O data are primarily sequential ? Separating I/O data and CPU data

17. Separating I/O data and CPU data

18. Outline Revisiting I/O DMA Cache Design Evaluations Conclusions This slide introduce an overview of the presentation.This slide introduce an overview of the presentation.

19. DMA Cache Design Issues Write Policy Cache Coherence Replacement Policy Prefetching

20. DMA Cache Design Issues Write Policy Cache Coherence Replacement Policy Prefetching

21. DMA Cache Design Issues Write Policy Cache Coherence Replacement Policy Prefetching

22. A Big Issue How to prove the correctness of integrating the heterogeneous cache coherency protocols in a system?

23. A Global State Method for Heterogeneous Cache Coherence Protocol [Pong-SPAA93, Pong-JACM98]

24. Global State Cache Coherence Theorem   Given N (N>1) well-defined cache protocols, they are not conflict if and only if there does not exist any Conflict Global States in the global state transition machine.

25. MOESI + ESI

26. DMA Cache Design Issues Write Policy Cache Coherence Replacement Policy Prefetching

27. DMA Cache Design Issues Write Policy Cache Coherence Replacement Policy Prefetching

28. Design Complexity vs. Design Cost

29. Outline Revisiting I/O DMA Cache Design Evaluations Conclusions

30. Speedup of Dedicated DMA Cache This slide show that a straightforward sequential prefetching for DMA cache is very effective. It can exhibit an impressive high prefetching accuracy.This slide show that a straightforward sequential prefetching for DMA cache is very effective. It can exhibit an impressive high prefetching accuracy.

31. % of Valid Prefetched Blocks This slide show that a straightforward sequential prefetching for DMA cache is very effective. It can exhibit an impressive high prefetching accuracy.This slide show that a straightforward sequential prefetching for DMA cache is very effective. It can exhibit an impressive high prefetching accuracy.

32. Performance Comparisons Although PBDC does not additional on-chip storage, it can achieve about 80% of DDC’s performance improvements.

33. Outline Revisiting I/O DMA Cache Design Evaluations Conclusions

34. Conclusions We have proposed a DMA cache technique to separate I/O data and CPU We adopt a Global State Method for Integrating Heterogeneous Cache Protocols Experimental results show that DMA Cache schemes are better than the existing approaches that use unified, shared caches for I/O data and CPU data Still Open Problems, e.g., Can I/O data goes direct to L1 cache? How to design heterogeneous caches for different types of data? How to optimize MC with awareness of IO

35. Thanks! & Question?

36. Design Complexity of PBDC

37. More References on Cache Coherence Protocol Verification Fong Pong , Michel Dubois, Formal verification of complex coherence protocols using symbolic state models, Journal of the ACM (JACM), v.45 n.4, p.557-587, July 1998 Fong Pong , Michel Dubois, Verification techniques for cache coherence protocols, ACM Computing Surveys (CSUR), v.29 n.1, p.82-126, March 1997

  • Login