700 likes | 1.4k Views
MS Thesis Defense “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” By Tania Jareen. CoE EECS Department April 21, 2014. About Me. Tania Jareen MS in Electrical Engineering with Thesis GTA for Routing and Switching–II Publications:
E N D
MS Thesis Defense “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” By Tania Jareen CoE EECS Department April 21, 2014
About Me • Tania Jareen • MS in Electrical Engineering with Thesis • GTA for Routing and Switching–II • Publications: • “An Effective Locking-Free Caching Technique for Power-Aware Multicore Computing Systems,” accepted in the IEEE ICIEV-2014 conference. • “A Novel Level-1 Cache Mapping Approach to Improve System Security without Compromising Performance to Power Ratio,” currently preparing.
Committee Members • Dr. Abu Asaduzzaman, EECS Dept. • Dr. Ramazan Asmatulu, ME Dept. • Dr. Zheng Chen, EECS Dept.
“Improving Performance, Power, and Security of Multicore Systems using Cache Organization” Outline ► • Introduction • Problem Statement • Some Important Terms • Previous Work • Proposal • Simulation • Simulation Results • Conclusions • Future Work Q U E S T I O N S ? Any time, please.
Introduction • Multicore System • Multicore system is a collection of parallel or concurrent processing units, divides a large complex problem into many small tasks • Main goal : to process a complex problem faster Dual-core System
Problem Statement • Challenges for Multicore System • High Average Memory Latency • High Total Power Consumption • Cache Side Channel Security Attack
Contributions • Propose a multicore system design to reduce the average memory latency • Propose a multicore system design to reduce the total power consumption • Propose a multicore system design to provide hardware level security
Some Important Terms • Cache • A small buffer to store recent information • Helps to mitigate the speed gap between processor and main memory • Increases the overall performance of the system significantly • Logically cache is placed between CPU and main memory Cache and Main Memory (Computer Desktop Encyclopedia)
Some Important Terms • Cache Organization Cache Organization • Cache Hit – The requested data contains in the cache • Cache Miss – The requested data does not contain in the cache
Some Important Terms • Cache Replacement Policy • Some blocks from the cache need to be replaced to store new blocks as the cache memory size is limited • Replacement should be in a manner so that the miss ratio will be low • Some of the well know cache replacement policies – Least Recent Used (LRU), Random, Most Recent Used, First In First Out etc Cache Replacement Policy (Aaron Toponce)
Some Important Terms • Memory Update Policy • This is a combination of Read policy and Write policy • Read Policy – indicates how a word is to be read. • Write Policy – indicates how the write of a memory block will be handled. Example: Write-Through, Write-Back
Some Important Terms • Cache Locking • Lock the most usable data for future • During replacement, these locked blocks will not be replaced • Increases the hit ratio and performance • Reduces average memory access time and power consumption • Problem : Hard to predict locking blocks, all processor configuration does not suit, reduces effective cache size Locked Cache
Some Important Terms • Victim Cache • Oldest and the most popular technique to improve performance • Used in between CL1 and CL2 • Holds the victim blocks during cache replacement • Reduces average memory latency and total power consumption Victim cache Organization
Some Important Terms • Stream Buffering • During cache miss, required blocks along with some additional blocks come from main memory to CL2 and then copy to CL1 • The additional blocks are kept in Stream Buffer • Helps to reduce average memory latency and total power consumption
Some Important Terms • Cache Side Channel Attack • Hardware attack, mainly on cache • Exploits important information from cache by passively monitoring • Uses physical properties (Example: time variation, power consumption, sound variation, heat production) [1,2,3,4] • Silent attack, but most dangerous
Some Important Terms • A-Symmetric Encryption • Step 1: Receiver generates private and public key and shares the public key with the sender. • Step 2. Sender encrypts the information using the public key • Step 3: Sender sends the encrypted information to receiver • Step 4: Receiver decrypts the information using its own private key A-Symmetric Encryption
Previous Work • To Improve Average Memory Latency and Total Power Consumption: • Victim Cache between CL1 and main memory and Stream Buffering [6] • Problem – no guarantee that the victim blocks are with maximum number of miss • Selective Victim Caching [7] • Problem – possibility to pollute the cache, need prediction
Previous Work • Selective Pre-Fetching [8] • Problem – need a history of references • Cache Locking [9] • Problem – hard to predict the blocks with high cache miss, all processor configuration do not support • To Improve Cache Level Security: • Partitioned Cache [1] • Problem – cache underutilization, need to depend on software • Dynamic Memory-to-Cache Remapping [5]
Proposed Mechanism MCB = Miss Cache Block VCB = Victim Cache Block SBB = Stream Buffering Block BACMI = Block Address and Cache Miss Information SLLC = Shared Last Level Cache • Smart Victim Cache (SVC) Proposed Cache Organization with SVC
Work Flow Diagram Work Flow Diagram
Proposed Mechanism Maximum Number of BACMI entries for a given SVC with various MCB MCB = Miss Cache Block VCB = Victim Cache Block SBB = Stream Buffering Block BACMI = Block Address and Cache Miss Information
Simulation • Assumptions • SVC can be enabled and disabled • All cores equally share SVC • LRU replacement policy is used • Write-Back update policy is used
Simulation • Workload • Moving Picture Experts Group’s – 4 (MPEG-4) • Advanced Video Coding (H.264/AVC) • Matrix Inversion (MI) • Fast Fourier Transform (FFT) • H.264/AVC behaves similar to MPEG-4 • MI behaves similar to FFT
Simulation • Input Parameters • Number of cores = 4 • SVC size = 2, 4, 8, 16, 32 KB • I1/D1 size of CL1 = 8/8, 16/16, 32/32, 64/64, 128/128 KB • CL2 size = 256, 512, 1024, 2048, 4096 KB • Line size = 16, 32, 64, 128, 256 B • Associativity level = 1- ,2-, 4-, 8-, 16 - way
Simulation • Assumption for Delay Penalty • Number of cycle to load and store any operation = 100 • Number of cycle to branch any operation = 150
Simulation • Assumption for Power Consumption
Simulation Results • Impact of SVC Size
Simulation Results • Impact of SVC and CL1 Size Impact of SVC and CL1 Size on Memory Latency and Total Power Consumption • Both the latency and total power consumption decreases for MPEG-4 when the cache size increases • For MPEG-4, both latency and power decreases mostly with SVC and no locking
Simulation Results • Impact of SVC and Line Size Impact of SVC and Line Size on Memory latency and Total power Consumption • With the increase of line size of MPEG-4, latency and power consumption decreases • For MPEG-4, latency and power consumption both decreases with SVC and no locking
Simulation Results • Impact of SVC and Associativity Level Impact of SVC and Associativity Level on Memory Latency and Total Power Consumption • For MPEG-4, with increase of associativity level, latency and power consumption decreases • For MPEG-4, latency and power consumption decreases most for SVC and no locking
Simulation Results • Impact of SVC and CL2/SLLC Size Impact of SVC and CL2/SLLC Size on Memory Latency and Total power Consumption • With the increase of CL2 size, for MPEG-4, latency becomes stable but power consumption increases • Both latency and power consumption for MPEG-4, decreases mostly for using SVC and no locking
Simulation Results • Comparison of SVC and Cache Line Locking Comparison of SVC and Cache Line Locking • Average memory latency and total power consumption decreases as locked CL2 cache increases from 0% to 25% locking. • Average memory latency and total power consumption both decreases with using SVC and no locking compared to using locking or no SVC and no locking.
Proposed Solution for Security Improvement • Randomized Cache Mapping Between D1X and CL1 (Solution-1) Randomized Cache Mapped Between D1X and CL1
Proposed Solution for Security Improvement • Problem with Solution-1 • Extra hardware D1X implementation • Increase memory latency for processing • Increase total power consumption about 17%
Proposed Modified Solution for Security Improvement • Randomized Cache Mapping Between Main Memory and CL1 (Solution-2) Randomized Cache Mapped between CL1 and Main Memory It is expected that the probability of cache side channel attack decreases from 40K to 1 for 16 blocks of CL1
Conclusions • Using several levels of cache in multicore systems cause serious performance and power issue • Shared cache among various cores in multicore system cause hardware level security threat • Proposed SVC significantly increases the system performance by reducing the memory latency, power consumption • Proposed cache randomization technique between main memory and CL1 reduces the probability of cache attack significantly
Conclusions • Average memory latency is reduced with SVC by 17% compared to CL2 cache locking • Total power consumption is reduced with SVC by 21% compared to CL2 cache locking • According to our estimates the probability of cache side channel attack reduces from 40K to 1 for 16 block of CL1
Future Work • Explore the impact of SVC on average memory latency, total power consumption for real time embedded system and Handheld computers • Explore the randomized cache mapping between CL1 and main memory technique on real time embedded system and handheld computers
“Improving Performance, Power, and Security of Multicore Systems using Cache Organization” QUESTION
“Improving Performance, Power, and Security of Multicore Systems using Cache Organization” Thank You Contact: Full Name: Tania Jareen Telephone: (316) 516-8516 E-mail: txjareen@wichita.edu
References • D. Page, “Partitioned Cache Architecture as a Side-Channel Defense Mechanism,” in Cryptology ePrint Archive, Report 2005/280, 2005. • O. Aciicmez, “Yet another Micro Architectural Attack: exploiting I-Cache,” in CSAW ’07 Proceedings of the 2007 ACM workshop on Computer security architecture, pp. 11-18, DOI: 10.1145/1314466.1314469, 2007. • C.P. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems,” Springer Berlin Heidelberg. pp. 104-113, DOI: 10.1007/3-540-68697-5_9, 1996. • P. Kocher, et al., "Differential Power Analysis," in Proceedings of the 19th Annul International Cryptology Conference on Advances in Cryptology, 1999.
References • Z. Wang and R.B. Lee, "A novel cache architecture with enhanced performance and security," in Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, 2008. 6. N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Western Research Laboratory (WRL), Digital Equipment Corporation, URL:https://www.cis.upenn.edu/~cis501/papers/joupp victim.pdf, 1990. • D. Stiliadia and A. Varma, “Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches,” in IEEE Transactions on Computers, Vol. 46, No. 5. pp. 603-610, DOI: 10.1109/12.589235, 2002.
References 8. R. Pendse and H. Katta, “Selective Prefetching: Prefetching when only required,” in the 42nd Midwest Symposium on Circuits and Systems, Vol. 2. pp. 866-869, DOI: 10.1109/MWSCAS.1999.867772, 1999. 9. A. Asaduzzaman, F.N. Sibai, and M. Rani, “Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level,” in the EUROMICRO Journal of Systems Architecture, Vol. 56, Issue 4-6. pp 151-162, 2010.