1 / 43

MS Thesis Defense

MS Thesis Defense “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” By Tania Jareen. CoE EECS Department April 21, 2014. About Me. Tania Jareen MS in Electrical Engineering with Thesis GTA for Routing and Switching–II Publications:

ernie
Download Presentation

MS Thesis Defense

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MS Thesis Defense “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” By Tania Jareen CoE EECS Department April 21, 2014

  2. About Me • Tania Jareen • MS in Electrical Engineering with Thesis • GTA for Routing and Switching–II • Publications: • “An Effective Locking-Free Caching Technique for Power-Aware Multicore Computing Systems,” accepted in the IEEE ICIEV-2014 conference. • “A Novel Level-1 Cache Mapping Approach to Improve System Security without Compromising Performance to Power Ratio,” currently preparing.

  3. Committee Members • Dr. Abu Asaduzzaman, EECS Dept. • Dr. Ramazan Asmatulu, ME Dept. • Dr. Zheng Chen, EECS Dept.

  4. “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” Outline ► • Introduction • Problem Statement • Some Important Terms • Previous Work • Proposal • Simulation • Simulation Results • Conclusions • Future Work Q U E S T I O N S ? Any time, please.

  5. Introduction • Multicore System • Multicore system is a collection of parallel or concurrent processing units, divides a large complex problem into many small tasks • Main goal : to process a complex problem faster Dual-core System

  6. Problem Statement • Challenges for Multicore System • High Average Memory Latency • High Total Power Consumption • Cache Side Channel Security Attack

  7. Contributions • Propose a multicore system design to reduce the average memory latency • Propose a multicore system design to reduce the total power consumption • Propose a multicore system design to provide hardware level security

  8. Some Important Terms • Cache • A small buffer to store recent information • Helps to mitigate the speed gap between processor and main memory • Increases the overall performance of the system significantly • Logically cache is placed between CPU and main memory Cache and Main Memory (Computer Desktop Encyclopedia)

  9. Some Important Terms • Cache Organization Cache Organization • Cache Hit – The requested data contains in the cache • Cache Miss – The requested data does not contain in the cache

  10. Some Important Terms • Cache Replacement Policy • Some blocks from the cache need to be replaced to store new blocks as the cache memory size is limited • Replacement should be in a manner so that the miss ratio will be low • Some of the well know cache replacement policies – Least Recent Used (LRU), Random, Most Recent Used, First In First Out etc Cache Replacement Policy (Aaron Toponce)

  11. Some Important Terms • Memory Update Policy • This is a combination of Read policy and Write policy • Read Policy – indicates how a word is to be read. • Write Policy – indicates how the write of a memory block will be handled. Example: Write-Through, Write-Back

  12. Some Important Terms • Cache Locking • Lock the most usable data for future • During replacement, these locked blocks will not be replaced • Increases the hit ratio and performance • Reduces average memory access time and power consumption • Problem : Hard to predict locking blocks, all processor configuration does not suit, reduces effective cache size Locked Cache

  13. Some Important Terms • Victim Cache • Oldest and the most popular technique to improve performance • Used in between CL1 and CL2 • Holds the victim blocks during cache replacement • Reduces average memory latency and total power consumption Victim cache Organization

  14. Some Important Terms • Stream Buffering • During cache miss, required blocks along with some additional blocks come from main memory to CL2 and then copy to CL1 • The additional blocks are kept in Stream Buffer • Helps to reduce average memory latency and total power consumption

  15. Some Important Terms • Cache Side Channel Attack • Hardware attack, mainly on cache • Exploits important information from cache by passively monitoring • Uses physical properties (Example: time variation, power consumption, sound variation, heat production) [1,2,3,4] • Silent attack, but most dangerous

  16. Some Important Terms • A-Symmetric Encryption • Step 1: Receiver generates private and public key and shares the public key with the sender. • Step 2. Sender encrypts the information using the public key • Step 3: Sender sends the encrypted information to receiver • Step 4: Receiver decrypts the information using its own private key A-Symmetric Encryption

  17. Previous Work • To Improve Average Memory Latency and Total Power Consumption: • Victim Cache between CL1 and main memory and Stream Buffering [6] • Problem – no guarantee that the victim blocks are with maximum number of miss • Selective Victim Caching [7] • Problem – possibility to pollute the cache, need prediction

  18. Previous Work • Selective Pre-Fetching [8] • Problem – need a history of references • Cache Locking [9] • Problem – hard to predict the blocks with high cache miss, all processor configuration do not support • To Improve Cache Level Security: • Partitioned Cache [1] • Problem – cache underutilization, need to depend on software • Dynamic Memory-to-Cache Remapping [5]

  19. Proposed Mechanism MCB = Miss Cache Block VCB = Victim Cache Block SBB = Stream Buffering Block BACMI = Block Address and Cache Miss Information SLLC = Shared Last Level Cache • Smart Victim Cache (SVC) Proposed Cache Organization with SVC

  20. Work Flow Diagram Work Flow Diagram

  21. Proposed Mechanism Maximum Number of BACMI entries for a given SVC with various MCB MCB = Miss Cache Block VCB = Victim Cache Block SBB = Stream Buffering Block BACMI = Block Address and Cache Miss Information

  22. Simulation • Assumptions • SVC can be enabled and disabled • All cores equally share SVC • LRU replacement policy is used • Write-Back update policy is used

  23. Simulation • Workload • Moving Picture Experts Group’s – 4 (MPEG-4) • Advanced Video Coding (H.264/AVC) • Matrix Inversion (MI) • Fast Fourier Transform (FFT) • H.264/AVC behaves similar to MPEG-4 • MI behaves similar to FFT

  24. Simulation • Input Parameters • Number of cores = 4 • SVC size = 2, 4, 8, 16, 32 KB • I1/D1 size of CL1 = 8/8, 16/16, 32/32, 64/64, 128/128 KB • CL2 size = 256, 512, 1024, 2048, 4096 KB • Line size = 16, 32, 64, 128, 256 B • Associativity level = 1- ,2-, 4-, 8-, 16 - way

  25. Simulation • Assumption for Delay Penalty • Number of cycle to load and store any operation = 100 • Number of cycle to branch any operation = 150

  26. Simulation • Assumption for Power Consumption

  27. Simulation Results • Impact of SVC Size

  28. Simulation Results • Impact of SVC and CL1 Size Impact of SVC and CL1 Size on Memory Latency and Total Power Consumption • Both the latency and total power consumption decreases for MPEG-4 when the cache size increases • For MPEG-4, both latency and power decreases mostly with SVC and no locking

  29. Simulation Results • Impact of SVC and Line Size Impact of SVC and Line Size on Memory latency and Total power Consumption • With the increase of line size of MPEG-4, latency and power consumption decreases • For MPEG-4, latency and power consumption both decreases with SVC and no locking

  30. Simulation Results • Impact of SVC and Associativity Level Impact of SVC and Associativity Level on Memory Latency and Total Power Consumption • For MPEG-4, with increase of associativity level, latency and power consumption decreases • For MPEG-4, latency and power consumption decreases most for SVC and no locking

  31. Simulation Results • Impact of SVC and CL2/SLLC Size Impact of SVC and CL2/SLLC Size on Memory Latency and Total power Consumption • With the increase of CL2 size, for MPEG-4, latency becomes stable but power consumption increases • Both latency and power consumption for MPEG-4, decreases mostly for using SVC and no locking

  32. Simulation Results • Comparison of SVC and Cache Line Locking Comparison of SVC and Cache Line Locking • Average memory latency and total power consumption decreases as locked CL2 cache increases from 0% to 25% locking. • Average memory latency and total power consumption both decreases with using SVC and no locking compared to using locking or no SVC and no locking.

  33. Proposed Solution for Security Improvement • Randomized Cache Mapping Between D1X and CL1 (Solution-1) Randomized Cache Mapped Between D1X and CL1

  34. Proposed Solution for Security Improvement • Problem with Solution-1 • Extra hardware D1X implementation • Increase memory latency for processing • Increase total power consumption about 17%

  35. Proposed Modified Solution for Security Improvement • Randomized Cache Mapping Between Main Memory and CL1 (Solution-2) Randomized Cache Mapped between CL1 and Main Memory It is expected that the probability of cache side channel attack decreases from 40K to 1 for 16 blocks of CL1

  36. Conclusions • Using several levels of cache in multicore systems cause serious performance and power issue • Shared cache among various cores in multicore system cause hardware level security threat • Proposed SVC significantly increases the system performance by reducing the memory latency, power consumption • Proposed cache randomization technique between main memory and CL1 reduces the probability of cache attack significantly

  37. Conclusions • Average memory latency is reduced with SVC by 17% compared to CL2 cache locking • Total power consumption is reduced with SVC by 21% compared to CL2 cache locking • According to our estimates the probability of cache side channel attack reduces from 40K to 1 for 16 block of CL1

  38. Future Work • Explore the impact of SVC on average memory latency, total power consumption for real time embedded system and Handheld computers • Explore the randomized cache mapping between CL1 and main memory technique on real time embedded system and handheld computers

  39. “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” QUESTION

  40. “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” Thank You Contact: Full Name: Tania Jareen Telephone: (316) 516-8516 E-mail: txjareen@wichita.edu

  41. References • D. Page, “Partitioned Cache Architecture as a Side-Channel Defense Mechanism,” in Cryptology ePrint Archive, Report 2005/280, 2005. • O. Aciicmez, “Yet another Micro Architectural Attack: exploiting I-Cache,” in CSAW ’07 Proceedings of the 2007 ACM workshop on Computer security architecture, pp. 11-18, DOI: 10.1145/1314466.1314469, 2007. • C.P. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems,” Springer Berlin Heidelberg. pp. 104-113, DOI: 10.1007/3-540-68697-5_9, 1996. • P. Kocher, et al., "Differential Power Analysis," in Proceedings of the 19th Annul International Cryptology Conference on Advances in Cryptology, 1999.

  42. References • Z. Wang and R.B. Lee, "A novel cache architecture with enhanced performance and security," in Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, 2008. 6. N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Western Research Laboratory (WRL), Digital Equipment Corporation, URL:https://www.cis.upenn.edu/~cis501/papers/joupp victim.pdf, 1990. • D. Stiliadia and A. Varma, “Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches,” in IEEE Transactions on Computers, Vol. 46, No. 5. pp. 603-610, DOI: 10.1109/12.589235, 2002.

  43. References 8. R. Pendse and H. Katta, “Selective Prefetching: Prefetching when only required,” in the 42nd Midwest Symposium on Circuits and Systems, Vol. 2. pp. 866-869, DOI: 10.1109/MWSCAS.1999.867772, 1999. 9. A. Asaduzzaman, F.N. Sibai, and M. Rani, “Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level,” in the EUROMICRO Journal of Systems Architecture, Vol. 56, Issue 4-6. pp 151-162, 2010.

More Related