1 / 30

Memory Faults: Injection & Solutions

Memory Faults: Injection & Solutions . Jeffrey Freschl, Di Xue. The Problem. “Memory meets corruption, it happens everyday, it could happen to you…” --famous quote modified from the People Store Commercial Can Linux handle cheap memory? Can we protect ourselves from memory faults?.

winter-wise
Download Presentation

Memory Faults: Injection & Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Faults: Injection & Solutions Jeffrey Freschl, Di Xue

  2. The Problem “Memory meets corruption, it happens everyday, it could happen to you…” • --famous quote modified from the People Store Commercial • Can Linux handle cheap memory? • Can we protect ourselves from memory faults?

  3. Talk Outline • Some Preparation (The How) • Actual Corruption and Results • A Solution (Methods and Implementation)

  4. Part I – Some Preparation (The How)

  5. Hardware vs Software Fault Injection VS

  6. Software Fault Injection • SWIFI – Software implemented fault injection is a common way to validate system design. • SWIFI gives the freedom we need.

  7. Fault Injection Process

  8. What We Inject? Task_struct • Process – An instance of a program in execution. • Kernel must know process’s state to properly manage. • Task_struct contains information about a process.

  9. Data Members • prio: process’s priority • run_list: address of entry in runqueue which contains list of TASK_RUNNING processes. • time_slice: amount of time to run • lock_depth: locking for simultaneous access. • policy: fifo, round robin, etc. • mmap_base: below thestack's low limit (the base) • vm_start: start address of the VM area

  10. Part II – Finally, Lets Start Corrupting!

  11. Good, Lets begin the Stress! (Workloads)

  12. Results for Simple Program

  13. Running Blast

  14. Fault Propagation • EIP locates fault point • Call Trace illustrates path to fault

  15. Part III – A Solution Protecting Linux from Di’s Corruption

  16. Methods (Update & Access) • Error Correcting Codes (ECC) • Majority Vote What are the tradeoffs? Time? Space? Recoverability?

  17. Intro to Hamming Code (Magic) • Hamming Rule d + p + 1 ≤ 2p (d is # of input bits, p is # of parity bits) • Generator Matrix G G = [I:A] A is a (d X p) dim matrix A must have unique rows and columns

  18. Hamming cont. (More Magic) • To encode input string codeword = input x G • To check if input string is corrupt H = [AT : I ] syndrome = H * codeword if( syndrome == 0 ) then no corruption otherwise, match syndrome to column in H

  19. Hamming (Back to Reality) • Redundancy • Can only recover from 1 bit corruption • Space • Almost constant (optimal # of parity bits) • Time • Lots of bitwise XORs and ANDs

  20. Majority Vote • Time to update very fast! • Space Overhead! • Simple Implementation!! If( copy1 != copy2 ) use copy3 else everything is ok 

  21. Part IV Implementation

  22. Design Goals • Want a “redundancy repository” for entire kernel • Minimize Programmer’s Pain! • On demand backup • Scalability

  23. “Just give me a location and I’ll take care of you!” - Redundancy Repository

  24. Redundancy Repository Redundancy HashTable Member Entry int size long id char parity

  25. How to Protect? Redundancy API • checkParity( addressOfMember, size ) • Add before a read access • updateParity( addressOfMember, addressOfNewValue, size ) • Add before an update

  26. Some Challenges • Dealing with different sized data members. • Originally focused on protecting address • Solution: Need to know size of data • What about recursive redundancy? • User Registration • Manual Integration

  27. Updated Results Di + Kernel + Solution  Harmony

  28. Summary • 20% of the critical data members we tested caused a crash. • Finding every location that updates memory is difficult. • The system no longer crashed with our redundancy solution.

  29. Thank You • Jeffrey Freschl jfreschl@cs.wisc.edu • Di Xue goldenspaceship@gmail.com

More Related