1 / 48

Introduction to CUDA Programming

Learn the basics of CUDA programming and advanced computer architecture in this comprehensive course. Determine if this course is right for you, understand the prerequisites and what is expected of you. Explore modern processor design and security attacks. Dive into Intel's Sandy Bridge architecture, Meltdown and Spectre attacks, and Moin Qureshi's paper on mitigating cache attacks. Discover the architecture of Intel's Sandy Bridge chip and understand its core architecture and memory access units. Gain insights into Meltdown and Spectre attacks and learn how to check if an address is cached. Study the concepts of flush+reload, exception handling, and branch prediction exploitation. Finally, explore the CEASER technique for mitigating conflict-based cache attacks through encrypted address and remapping.

rdanaher
Download Presentation

Introduction to CUDA Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to CUDA Programming Advanced Computer Architecture Andreas Moshovos Winter 2019 Introduction

  2. Goals for Today • Should you take the course? • What should you know • What is expected of you • What you will get out of it

  3. Should you take the course • What should you know? • I’ll let you decide on your own • To help you: • Overview modern processor design • Through reviewing modern security attacks • And to make it worthwhile for everyone • Through reviewing a recent excellent work on mitigating them

  4. Material for today • Intel Sandy Bridge overview • Older design but Intel is using the same microarchitecture with “minor” tweaks as far as our discussion is concerned • Meltdown and Spectre attacks • Got to really understand micro-architecture to “get it” • Moin Qureshi’s: CEASER: Mitigating Conflict-Based Cache Attacks via Encrypted-Address and Remapping. MICRO 2018: 775-787, best paper award

  5. Intel’s Sandy Bridge Architecture

  6. Chip Architecture • 32nm

  7. Overall Core Architecture

  8. Core Architecture – Instruction Fetch

  9. Instruction Decode

  10. Out-of-Order Execution

  11. Execution Units – Scalar, SIMD, FP and AVX (vector)

  12. Memory Access Units L1 load to use latency is 4 cycles Banked to support multiple accesses Poor man’s multiporting? L2 load to use latency is 12 cycles

  13. L3 and ring interconnect • CPUs, GPU and system agent • Communicate via a ring • Each CPU has an 2MB 8-way SA L3 slice • Static hash of addresses to slices • Latency varies: which core to which slice • 26-31 cycles • Max bandwidth: 435.2GB/s at 3.4Ghz

  14. Overall Core Architecture

  15. Meltdown

  16. Isolation • User vs. kernel space isolation • Privilege bit • Kernel space mapped onto user space

  17. Overview of attack • Exploit speculative execution • To temporarily load protected data into a register • Use value to cause micro-architectural state change which persists

  18. Meltdown Attack

  19. Timing Scenarios

  20. How to check whether an address is cached?

  21. Flush+Reload • Clflush instruction • Flush any cache line that contains a specific address • Works for shared addresses • Think code that is shared among two processes • Paper shows how to use that to read the secret key by detecting the order in which code functions are called

  22. Meltdown skeleton

  23. Exception • E.g., divide by zero or access illegal address • Suppression vs. Handling • Handling: • fork prior to exception • Suppression: • Use Transactional Memory handling • Branch prediction exploitation (Spectre)

  24. Example • Line 4: attempt to read the secret address [rcx] into register al • this will raise a protection exception but the CPU will still do it as part of speculative execution the exception will be handled when the mov tries to RETIRE (Commit) • SHL by 12 multiplies the AL values by 4K the page size • The mov RBX will try to read a page at a distance based on the AL value. • This is a race, may or may not happen. • Step 2: the attacker times accesses to all 256 pages.

  25. Spectre

  26. SPECTRE

  27. CEASER

  28. Goal • LLC is shared • LLC contains micro-architectural state • Using side-channel attacks a process can read values from another process through this side-channel • TO do so the attacker needs to know how addresses are mapped onto the LLC • CEASER: per process mapping which changes frequently

  29. CEASER: Mitigating Conflict-Based Attacks via Encrypted-Address and Remapping MICRO-2018 Moinuddin Qureshi

  30. Background: Resource Sharing Modern systems share LLC for improving resource utilization B LLC CORE CORE Sharing the LLC allows system to dynamically allocate LLC capacity

  31. Conflict-Based Cache Attacks Co-running spy can infer access pattern of victim by causing cache conflicts V B B A Miss for B Victim Accessed Set LLC CORE (Spy) CORE (Victim) Conflicts leak access pattern, used to infer secret [AES – Bernstein’05]

  32. Prior Solutions Table-Based Randomization Way Partitioning RPCache[ISCA’07], NewCache[MICRO’08] NoMo [TACO’12], CATalyst [HPCA’16] Mapping Table (MT) Inefficient use of cache space Mapping Table large for LLC (MBs) OS support needed to protect Table Not scalable to many core

  33. Our Goal • Protect the LLC from conflict-based attacks, while incurring • Negligible storage overhead • Negligible performance overhead • No OS support • No restriction on capacity sharing • Localized Implementation

  34. Outline • Why? • CEASE • CEASER • Effective?

  35. CEASE: Cache using Encrypted Address Space Insight: Don’t memorize the random mapping, compute it Key Key Encrypt xCAFE0000 Decrypt Physical Line Address (PLA) 0xa17b20cf (ELA) LLC Dirty Evict ELA CEASE Localized change (ELA visible only within the cache) Cache operations (access, coherence, prefetch) all remain unchanged

  36. Randomization via Encryption Lines that mapped to the same set, get scattered to different sets B’’ A’ B A’’ A Key Key B’ Encrypt Encrypt LLC LLC LLC Mapping depends on the key, different machines have different keys CEASE CEASE

  37. Encryption: Need Fast, Small-Width Cipher Block Cipher B B PlainText CipherText PLA is ~40 bits (up-to 64TB memory) • Small-width Ciphers deemed insecure: • Brute-force attack on key • Memorize all input-output pairs Insight: ELA not visible to attacker (okay to use 40-bit block cipher) Larger tag (80+ bits) Latency of 10+ cycles

  38. Low-Latency Block Cipher (LLBC) Four-Stage Feistel-Network (with Substitution-Permutation Network) *inspired by DES and BlowFish Encryption LLBC incurs a delay of 24 XOR gates (approximately 2-cycle latency)

  39. Outline • Why? • CEASE • CEASER • Effective?

  40. Let’s Break CEASE … [Liu et al. S&P, 2015] Form pattern such that cache has a conflict miss D Remove one line from pattern & check conflict B E Conflict Miss? Yes C A No LLC Removed line MAPS to conflicting set Removed line NOT in conflicting set Attacker can break CEASE within 22 seconds (8MB LLC)

  41. CEASER: CEASE with Remapping Split time into Epoch of N accesses (change Key every Epoch) Key Key Key CACHE BULK time EPOCH CurrKey CurrKey CurrKey NextKey NextKey NextKey GRADUAL CEASER uses gradual remapping to periodically change the keys

  42. CEASER: CEASE with Remapping Remap-Rate of 1 %  Remap W-way set after (100*W) accesses SetPtr X1 A1 A0 B0 B1 X0 Access=400 Epoch=0 Access=800 Epoch=0 Access=600 Epoch=0 Access=200 Epoch=0 Access=0 Epoch=0 Access=0 Epoch=1 Y0 Y1 CurrKey CurrKey Z0 Z1 NextKey NextKey Cache Access: If (Set[CurrKey] < Sptr) Use NextKey CEASER with gradual remap needs negligible hardware (one bit per line)

  43. Outline • Why? • CEASE • CEASER • Effective?

  44. Security Analysis Time to learn “Eviction Set” for one set (vulnerability removed after remap, <1ms) Limits impact on missrate, energy, accesses to ~1% CEASER can tolerate years of attack (Even with remap-rate of 1%)

  45. Performance and Storage Overheads 8 cores with 8MB LLC 16-way (34 workloads, SPEC + Graph) Norm Performance (%) Rate-34 Mix-100 ALL-134 CEASER incurs negligible slowdown (~1% ) and storage overheads (24 bytes)

  46. Summary Need practical solution to protect LLC from conflict-based attacks Robust to attacks (years) Key1 Key2 Negligible slowdown (~ 1%) Encrypt Line Address Negligible storage (24 bytes) Cache Localized change (within cache) Change Key, Periodically Appealing for Industrial Adoption No OS support needed

  47. Course Details

  48. On average we will meet once per week • I will be traveling at times and I will be making up for the time “lost” by holding two lectures for some weeks • What I expect you to do • Reading assignments per week • Questionnaires to be handed in at the beginning of class • Homeworks • Programming assignments • Project • Validate some prior work • Do something new • Maybe present papers (we will see)

More Related